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I. INTRODUCTION 


Forms la and 1b of the Army General Classification Test have 
recently been released into the public domain. These and later 
forms of this test which are still in use have been administered 
to more than 12,000,000 men. During the war the Army 
General Classification Test was the major instrument used in the 
initial classification and assignment of all enlistees and Selective 
Service inductees. 

The first two forms were constructed under considerable pres- 
sure, by an insufficient and shifting staff, and on the basis of data 
which were recognized as inadequate in several respects. As a 
result the distributions of item difficulties of these forms are not 
as symmetrical as those of later forms, but despite this fact 
Forms la and 1b have proved to be quite reliable and valid in a 
wide variety of situations. 

The initial standardization of Form la, though accomplished 
under a variety of handicaps, turned out to be remarkably 
accurate. It seems probable that the Army standard score 





*The Army General Classification Test was the work of many people. 
None of those who participated in the construction of Forms la and 1b were 
with the Personnel Research Section at the time when the preparation of 
this report was begun, but one of them returned some time before it was 
finished. It was prepared by Dr. Edward E. Cureton, mainly on the basis 
of the original unpublished technical reports, memoranda, and data sheets. 
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scale, which is now familiar to hundreds of psychologists, or the 
adjusted Army standard score scale proposed in this paper, may 
become a recognized standard for reporting the scores on adult 
intelligence tests. 

Wide use of the Army General Classification Test by the 
psychological profession and the general public is to be expected. 
Civilian adaptations and revisions will no doubt appear shortly. 
A reprinting of Form la with slight modifications in format has 
appeared already. The purpose of this paper is to present all of 
the available data concerning the Army General Classification 
Test which are likely to be valuable to those who may wish to 
use it or to prepare revisions or adaptations of Forms la and 1b. 
In some cases the data are presented in the forms in which they 
were originally reported in unpublished War Department 
studies. In others they have been reworked and presented in 
forms more useful to the psychological profession and the prospec- 
tive users of the test. For the further information of psycholo- 
gists and prospective users it may be noted that twelve complete 
sample sets of AGCT la and 1b, including the two test booklets, 
the practice booklet, the answer sheet, the two scoring stencils 
and the manual, have been placed in the Library of Congress. 


II. CONSTRUCTION 


The development of items of several types was commenced 
in the Spring of 1940. A report was made on May 24 of that 
year to an advisory committee of the National Research Council 
and plans were drafted for the general set-up of the examination. 
It was decided that the test should consist of verbal, quantitative, 
and spatial items in approximately equal numbers; that it should 
be set up in cycle-omnibus form; that the total time of administra- 
tion should not exceed one hour; and that the results should be 
expressed in terms of standard scores rather than mental ages 
and IQ’s. , 

On June 6, 1940, Trial Form A was sent to the printers. This 
form contained fifty vocabulary, fifty figure grouping (inductive 
reasoning), and fifty ‘figures’ (spatial) items in separate subtests. 
The figure grouping and ‘figures’ items proved unsatisfactory 
in preliminary try-out because their ranges of difficulty were too 
narrow. ‘Trial Form B, consisting of fifty vocabulary items, 
fifty arithmetic reasoning items, and fifty block-counting items, 
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also in separate subtests, was published shortly afterward. 
Experimentation with this form had hardly begun when it 
became apparent that a test for service use would be needed 
almost immediately. Form la was therefore prepared from the 
items of Trial Form B, the vocabulary items of Trial Form A, 
and a few additional items for which there is no evidence of any 
preliminary try-out. The actual selection and arrangement of 
the items was done by a group of test experts who were brought 
in as consultants, mainly on the basis of their own best judgment, 
though the data on the vocabulary items of Trial Form A 
were available. At a second meeting of the National Research 
Council committee on August 9, 1940, it was reported that the 
test had been finished, and problems of standardization were 
discussed. 

There are one hundred fifty items in each form of the test— 
fifty vocabulary, fifty arithmetic, and fifty block-counting. The 
items are in cycle-omnibus arrangement, with ten of each type 
in the first cycle and five of each type in all of the succeeding 
cycles. All items are of the multiple-choice type with four 
alternatives, and responses are recorded on an IBM answer sheet. 
A separate practice booklet containing ten items of each type is 
administered before the test proper. The following samples 
from the practice booklet illustrate the three types of items: 


ToPERMITisto(A)demand (B)thank (C)allow (D) charge 


Tom sold 18 pints of milk at 9 cents a quart. How much money 
did he get for the milk? 
(A) 50¢ (B) 81¢ (C) $1 (D) $1.62 


How many blocks? 
(A) 5 (B) 4 (C)3 (D) 6 





The directions and practice exercises require ten to fifteen 
minutes, and exactly forty minutes are allowed for the test 
proper. The raw score is the number right minus one-third the 
number wrong. 

A new experimental battery, which was termed the Third 
Trial Form, was prepared to provide items for Form 1b. This 
battery consisted of two tests. One contained one hundred fifty 
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new vocabulary items, the other one hundred new arithmetic 
items along with the fifty arithmetic items of Form la. The 
battery was tried out on two experimental groups. The vocabu- 
lary and arithmetic items of Form 1b were selected from the 
new items of the Third Trial Form on the basis of the item- 
analysis data from these groups. The new arithmetic items were 
equated in difficulty to those of Form la by matching their 
difficulty distributions to those of the Form 1a arithmetic items 
embedded among the new ones of the Third Trial Form. In 
order to equate the arithmetic items of Form 1a to its vocabulary 
items, two additional item-analysis studies of Form la were 
made. The new vocabulary items were selected to yield the 
same distribution of difficulty, relative to that of the Form la 
arithmetic items embedded in the Third Trial Form, as that 
given by the Form la vocabulary items relative to the same 
arithmetic items in the Form la studies. This procedure was 
made necessary by the fact that the groups to whom the Third 
Trial Form were given did not have the same ranges of ability 
as those used in the item-analyses of Form la. The item- 
analysis studies of Forms la and 1b are described in some detail 
in the next section. 

A new set of fifty block-counting items was prepared and 
administered experimentally to 238 CCC enrollees, but these 
items were not used in constructing Form 1b. They were used 
later in constructing Forms 1c and 1d, after more data had been 
obtained. It was finally decided to use the same block-counting 
items in Form 1b as in Form la, but they were rearranged in 
more nearly correct order of difficulty, on the basis of the first 
available item-analysis of Form la. The distribution of item 
difficulties of Form 1b was matched as closely as possible to that 
of Form la, even though the first item-analysis of Form la 
showed that it was somewhat overloaded with easy items. 


III, ITEM-ANALYSES 


Two item-analyses of Form la were made. The first was 
based on 200 selected cases from among the 3,790 used in stand- 
ardizing the test. The data were obtained from Forts Benning, 
McDowell, Moultrie, and Slocum. The cases selected included 
only those who attempted item 135 or some later item within 
the regular forty-minute time limit. This introduced some bias, 
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but the correlation between number of items attempted and total 
score is generally known to be low when the correction for chance 
is applied. The sample, which was termed the ‘ Miscellaneous 
Forts’ sample, was divided into upper and lower groups of one 
hundred each on the basis of total scores, and the number of 
correct responses to every item was determined for each group. 
Corrections have been applied to items 136 to 150, inclusive, 
which were not attempted by everyone tested. The correction 
procedure is described in Appendix I. 

The second item-analysis of Form 1a was based on data obtained 
at Jefferson Barracks. The test was applied without time limit 
to 712 men. The number of correct responses to every item 
was determined for the highest one hundred and the lowest 
one hundred, selected on the basis of total score. 

Both booklets (vocabulary and arithmetic) of the Third Trial 
Form were administered with ‘time sufficient for completion,’ 
to 123 men at Fort Meade and 95 men at Fort Myer. The two 
samples were combined for purposes of item-analysis. The 
entire sample of 218 was divided, for each booklet separately, 
into upper and lower groups of 109, on the basis of total scores 
on that booklet. When the data were analyzed it was found 
that the time allowed had not actually been sufficient to permit 
everyone to attempt every item. In each upper and lower group, 
therefore, the percentage of correct responses to every item was 
determined for all examinees who had attempted that item or 
some later item in the same booklet. 

The item-analysis data for all items of Forms la and 1b are 
presented in Table I. The index of difficulty for each item of 
Form la is the percentage of correct responses in the combined 
‘Miscellaneous Forts’ and Jefferson Barracks sample of 400. 
The index of difficulty for each of the vocabulary and arithmetic 
items of Form 1b is the percentage of correct responses of those 
attempting that item or some later item in the same booklet 
of the Third Trial Form, in the Fort Meade and Fort Myer 
sample of 218. The difficulty values of the Form 1b items have 
been adjusted to make them comparable to the values of the 
Form la items. The procedure used in making these adjust- 
ments is described in Appendix II. 

The index of discrimination for each of the vocabulary and 
arithmetic items of Form 1b is its tetrachoric correlation with 
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total score on the booklet consisting of items of the same type, 
as represented by membership in the upper or lower group of 109 
in the Fort Meade and Fort Myer sample. The correlation for 
each item is based only on the responses of those who attempted 
that item or some later item in the same booklet. The index of 
discrimination for each of the items of Form la is the average 
of three tetrachoric correlations with total score on the entire test. 
Two of these correlations were derived from the data of the 
Jefferson Barracks sample; one from the data of the ‘Mis- 
cellaneous Forts’ sample. The procedure by which these cor- 
relations were computed and combined is described in Appendix 
III. 

The indices of discrimination of the vocabulary and arith- 
metic items of Form 1b are higher than those of the items of 
Form la (including the block-counting items which were used in 
both forms) for two reasons: (a) They consist of correlations 
with tests whose items are all of the same type, while the indices 
of discrimination of the Form la items consist of correlations 
with total score on a test consisting of all three types of items. 
(b) They are the indices of discrimination of the items retained, 
based on the same sample used in making the original selection, 
while the indices of discrimination of the Form la items are 
based on new samples, and many of these items were selected 
without benefit of any previous item-analysis data. 


IV. STANDARDIZATION 


The original Army standard score scale was developed on the 
basis of a sample of men tested with Form la in September, 1940. 
The original sample consisted of the 3,790 soldiers from which the 
‘Miscellaneous Forts’ sample used in the item-analysis was 
drawn, together with 606 enrollees from five CCC camps, making 
a total of 4,396 cases. Since this sample was not representative 
of the general population of potential selectees, a weighting pro- 
cedure was adopted, based on the distributions of age, education, 
and area of residence. 

All of the men tested were white, and the great majority of 
them came from the Eastern seaboard. Geographical data for 
the 3,790 soldiers were available only in the form of Corps Area 
of induction. The states included in the Corps Areas were as 


follows: 
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Corps Area States Included No. of Cases 
1 Conn., Maine, Mass., N. H., R. I., Vt. 418 
2 Del., N. J., N. Y. 548 
3 D. C., Md., Penna., Va. 1,395 
4 Ala., Fla., Ga., La., Miss., N. C., 8. C., Tenn. 1,377 
5-9 All other States 604 
No data 54 


Of the 4,342 for whom residence data were available, 1,531 were 
under twenty years of age, 99 were thirty or over, and 20 failed 
to report their ages. These were eliminated, along with the 54 
for whom residence data were lacking, and 17 in the twenty to 
twenty-nine age group who did not report educational attain- 
ment, leaving a total of 2,675 in the final standardization sample. 
These latter were divided into the five residence areas indicated 
above; those in each residence area were subdivided into two 
age groups (twenty to twenty-four and twenty-five to twenty- 
nine), and those in each of the ten resulting age-residence groups 
were divided further into three educational attainment groups 
(less than seven years of schooling, seven to nine years, and ten 
or more years), giving a total of thirty cells to be weighted. 

The distributions of age, education, and area of residence of 
the general population were estimated from the 1930 census. 
The percentage of white males aged fourteen and fifteen who 
were in school in 1930 was taken as an estimate of the percentage 
who had reached or exceeded grade VII, and the percentage of 
those aged sixteen and seventeen who were in school at that time 
was taken as an estimate of the percentage who had reached or 
exceeded grade X. The geographical distribution was assumed 
to be the same in 1940 as in 1930, so the distributions by area 
of residence of white males ten to fourteen and fifteen to nineteen 
years of age in 1930 were taken as estimates of the 1940 dis- 
tributions of those twenty to twenty-four and twenty-five to 
twenty-nine years of age, respectively. Differential death 
rates and migration rates were ignored. A table was then con- 
structed containing these estimates of the distribution of the 
general white male population by area of residence, age, and 
education. This table was subdivided into the same thirty 
cells as those of the standardization sample. 

The weighting factor for each cell was the ratio of the popula- 
tion percentage in that cell to the sample percentage. The sum 
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of the scores on Form 1a and the sum of the squares of these 
scores were computed for all cases in each cell of the sample. 
Both sums in each cell were then multiplied by the corresponding 
weighting factors. Finally, the weighted sums for all cells were 
added and the weighted mean and standard deviation were 
computed. The weighted mean of the raw scores was 75.67, and 
the weighted standard deviation was 24.54. It should be noted 
that these figures are estimates for the white male population of 
potential selectees. 

It was recognized that this procedure involved a number of 
unverified assumptions. Among these, the following are perhaps 
the most important: 

1) That three variables—age, education, and area of residence 
—account for most of the test variance which is associated with 
systematic variations in the environment. 

2) That the distribution within each cell of the sample is 
random with respect to the distribution of scores in the cor- 
responding cell of the total population. 

3) That there is no differential death rate for the various 
groups (since the age distributions of the 1940 potential Army 
population were estimated from the 1930 census data). 

4) That the actual Army population would be a random 
sample of the total potential Army population, i.e., that no bias 
would enter into the selection of those men from the total poten- 
tial Army population who were to be drafted. 

The Army Standard Score scale was defined as having a mean 
of 100 and a standard deviation of 20. If welet SS = Standard 
Score and RS = raw score, the transformation equation is 


_ aqSDss_ yy, SDss 
SS = RS SDs Mrs SDs + Mss 
20 20 


= .815RS + 38.33 


Substituting all possible raw score values from 0 to 150 in this 
formula, the original standard score conversion table for Form la 
was prepared. Since the standard score system is based on a 
linear transformation of the raw scores on Form la, the fact 
that this test is somewhat overloaded with easy items might be 
expected to result in a skewed distribution of the standard scores. 
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This is in fact the case, though other factors are responsible in 
part also. 

For many purposes the Army used a broad grouping of stand- 
ard scores into five Army Grades. Originally the three central 
groups had standard score ranges of 20. After the test had been 
in use for some time it was found that too many men fell in 
Grade V, and for administrative convenience the minimum 
standard score for Grade IV was lowered ten points. The rela- 
tions between Army standard scores and Army Grades are as 
follows: 











Army Standard Score Range 
Army Grade 
; Through June, 1942 From July, 1942 
I 130 and higher 130 and higher 
II 110-129 110-129 
III 90-109 90-109 
IV 70-89 60-89 
V 69 and lower 59 and lower 











Form 1b was standardized by line-of-relation equating to 
Form la. A group consisting of 3,856 soldiers, 3,426 white and 
430 colored, from eight of the nine corps areas, was employed 
for this purpose. Half of them took Form la first and the 
other half Form 1b. The line-of-relation technique assumes 
linear regression. The line-of-relation is the principal diagonal 
of the correlation plot—the line lying halfway between the two 
regression lines. The correlation between Forms la and 1b 
was .95 in a subsample consisting of 495 cases selected by picking 
every seventh pair of answer sheets from each installation, 
and the regressions were both quite close to linear. The raw- 
score means and standard deviations are given below, along 
with those from the group used in standardizing Form la. 











’ is Standardization Group. 
Equating Group. N = 3,856 N = 2,675 
AGCT-I1b AGCT-la ATCT-la 
Mean 77.6 77.7 75.67 
SD 31.4 29.2 24.54 
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Forms Ic and 1d were constructed with better distributions of 
item difficulties. When these forms were equated to Form la, 
the question arose as to whether to equate percentiles and thus 
preserve the equivalence of standard scores throughout the 
range, or to equate the means and standard deviations and permit 
the skewness due to the overloading of Form la with easy items 
to disappear. 

The two new forms were first equated to Form la by both 
methods, and the resulting Army Grade distributions of Forms 
lc and 1d were derived. These distributions were averaged, 
and the final equivalents were derived in such a manner as to 
reproduce the average distribution. At the extremes these 
equivalents are closer to the equi-percentile equivalents than 
they are to the line-of-relation equivalents; in the central part 
of the range they are approximately midway between the two. 
Since the distributions of raw scores on Forms lc and ld were 
almost identical, a single table of equivalents was prepared for 
both forms, based on all the raw scores on Forms lc and ld 
taken together. The equating was done on a sample of 1,782 
soldiers from Fort Monmouth and Camp Lee. From a sub- 
sample of 593 cases selected by taking every third set of answer 
sheets, the following correlations were obtained: 


Forms la and Ic .90 
Forms la and 1d 88 
Forms lc and 1d .92 


Form 3a was standardized by equi-percentile equating to 
Forms lc and 1d, using a sample of 39,178 cases stratified by 
Service Command (formerly Corps Area), color, age (in three 
categories), and years of education (in five categories). The 
correlation between Form 3a and Form lic or 1d (which were 
used interchangeably), based on a subsample of 5,000, was .90. 

Form 3b was standardized by equi-percentile equating to 
Form 3a, using a sample of 1,000 drawn from approximately 
1,500 cases from Camp Atterbury, in such a manner as to match 
approximately the proportions of white and colored soldiers, 
and also to match the distribution of Form 3a standard scores to 
that of the Army as a whole. The correlation in this sample 
between Forms 3a and 3b was .94. 
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V. NORMS AND EQUIVALENT SCORES ON OTHER TESTS 


The basic norms for the Army General Classification Test, and 
equivalent scores on three other tests, are presented in Table II. 

Coiumn (1) shows Army standard scores as they have been 
defined since the standardization of Forms lc and ld. Column 
(2) shows the corresponding original Army standard scores used 
with Forms la and 1b. As noted in the previous section, 
approximately half the skewness introduced by the excess of 
easy items in Forms la and 1b was removed when Forms 1c and 
ld were standardized. The greatest discrepancy between these 
two sets of standard scores is five points, occurring in the region 
of standard score 55. Discrepancies of two points occur near 
the middle of the scale, and discrepancies of four points occur 
near the top. Column (3) shows adjusted Army standard 
scores. These scores have never been used by the Army. They 
were derived by line-of-relation equating of Forms lc and 1d to 
Form 1A, using the data of the Form lIc-ld standardization 
sample of 1,782, and then basing the standard score distribution 
entirely on the raw score distribution of Forms 1c and 1d in this 
sample. This distribution is more nearly normal than either 
of the others. All three distributions of standard scores come 
together at the points one standard deviation above and below 
the mean, i.e. at standard scores 80 and 120. 

The Army General Classification Test is unsuitable for measur- 
ing the abilities of persons having less than the functional 
equivalent of a fourth-grade education at the time of testing. 
Whenever large unselected groups are tested, therefore, there is a 
piling-up of raw scores in the zero-to-four region. This effect is 
considerably more marked with the later editions of the test than 
it is with Forms la and lb. In preparing the adjusted Army 
standard scores, therefore, the sixty-three raw scores of zero 
through four on Forms lc and 1d were arbitrarily distributed 
across the region from four through minus twenty, in such a man- 
ner as to match the distribution of the sixty-three top scores, 
which range from 110 through 134. 

In using the Army General Classification Test, no attempt 
should be made to interpret standard scores below 50, or adjusted 
standard scores below 56, or original standard scores below 46. 
Such scores may be due to illiteracy rather than low ability. 
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The main conclusion from such a score is that the individual who 
receives it should be re-examined with a non-verbal test. 

Column (4) shows centile norms for all those who entered the 
World War II Army from November 1940 through August 1945, 
a total of approximately 9,340,000. The method of derivation 
of these norms is described in Appendix IV. Each figure in this 
column indicates the percentage of the total group who obtain 
the corresponding score or some lower score. 

Column (5) shows the Form 1a raw scores corresponding to the 
three types of standard scores and the World War II Army 
centiles given in the first four columns. Column (6) shows the 
corresponding Form 1b raw scores. It will be noted that despite 
the original effort to match the Form 1b item-difficulty distri- 
bution to that of Form la, the effective range of measurement of 
Form |b is slightly less at both ends of the scale. 

The Army General Classification Test has been equated to 
three other well-known group intelligence tests by the equi- 
percentile method. Column (6) shows equivalent raw scores 
on the Otis Self-Administering Test of Mental Ability, Higher 
Examination, based on a sample of 1,646 men, 146 from Fort 
Sam Houston and 1,500 from Fort Monmouth, who were tested 
in the Spring of 1941. The Otis test was administered with 
thirty-minute time limit. Form la of the Army General 
Classification Test was used, and the data for this test were 
reported only in terms of Army grades. The Otis equivalents 
of Army standard scores below 70 and above 129 are therefore 
extrapolated. The correlation between the two tests, using 
Sheppard’s corrections to the standard deviations, is .83. 

Column (7) shows equivalent raw scores on the Wells Revised 
Alpha, Form 5, based on a sample of 768 men at Camp Lee who 
were tested in April 1945. The correlation between the two 
tests in this group is .90. 

Column (8) shows equivalent raw scores on the 1942 edition 
of the ACE Psychological Examination, based on the scores 
of 911 men at Fort Harrison and Fort Knox who were tested 
in March 1945. The group was selected by drawing a sample of 
1,371 high-school graduates whose AGCT distribution matched 
closely the distribution of all high-school graduates in the Army, 
and then drawing from this sample a subsample whose ACE 
distribution matched closely the distribution of all college fresh- 
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men and applicants who took the ACE test in 1942. The 
correlation in the equating group of 911 is .75. For purposes 
of further comparison, the centile norms for the 1942 college 
freshmen group are reproduced from the ACE report in Col- 
umn (9). These college freshmen centiles are based on 49,020 
scores from two hundred fifty-three colleges. 

All scores in any row of Table II may be considered equivalent. 
All of the equating groups except the one used in equating the 
ACE test yielded distributions of Army standard scores match- 
ing fairly well the distribution for the Army as a whole. The 
ACE equivalents are accurate only for groups similar to the 
one on which the ACE norms are based. 

The Army General Classification Test covers a wider range of 
ability than any of the others. Though the ACE test has more 
‘top,’ the AGCT is still measuring effectively at the 99.5 centile 
of college applicants. The Wells Revised Alpha has as much 
‘bottom’ but somewhat less ‘top.’ The Otis test (Higher 
Examination) has less ‘bottom,’ as would be expected, but it 
also has less ‘top.’ 

It may be of some interest to compare the two sets of cen- 
tiles. Seven per cent of the college freshmen group lie at or 
below the 50th centile for the Army; eighty-four per cent of the 
Army lie at or below the 50th centile for the college freshmen 


group. 
VI. THE DISTRIBUTION OF TEST-INTELLIGENCE 


Figure I shows the distribution of Army standard scores on 
which the centile norms given in Column (4) of Table II are 
based. The method by which this distribution was derived is 
described in Appendix IV. 

Figure II shows the corresponding distribution of adjusted 
Army standard scores. This distribution was prepared in the 
same manner as the one shown in Figure I, starting with a crude 
ogive drawn through points plotted directly from Columns (3) 
and (4) of Table II, and making successive adjustments to 
columns representing three-point adjusted Army standard score 
ranges in accordance with the criteria given in paragraph 9 of 
Appendix IV. 

As was anticipated, the distribution of adjusted Army stand- 
ard scores shown in Figure II is considerably less skewed than 
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is the distribution of actual Army standard scores shown in 
Figure I. Since all the Army data apply equally to both sets of 
standard scores, the adjusted Army standard scores should 
prove more generally useful, as they do not contain the artificial 
skewness introduced by the excess of easy items in Form la. 

A feature of both distributions which is of considerable import- 
ance is the bulge occurring in the region between standard 
scores 60 and 80. This bulge is a characteristic of the popula- 
tion, not an artifact of the units of measurement. It appears in 
all four of the sample distributions described in paragraph 9 of 
Appendix IV. One of these distributions (N = 644,065) is 
based on Forms la and 1b, reported in original standard scores, 
which are linearly related to the raw scores on these forms. 
Another (N = 1,782) is based on Forms Ic and 1d, reported in 
raw scores, one on each form for each examinee, so that the 
total distribution contains 3,564 scores. The bulge appears 
more markedly in Figure II than in Figure I because it is not 
attenuated by the artificial skewness in the latter. 

In attempting to account for this bulge we must remember 
that we are considering an adult population, and that two- 
thirds of the test consists of reading vocabulary and arithmetic 
items, which measure the basic contents of instruction in the 
elementary school. We might assume that as a result of their 
elementary education some individuals acquire a fundamentally 
positive attitude toward such materials, while others acquire a 
fundamentally negative attitude. The first group would tend 
generally to continue their education, at least through high 
school whenever possible, to continue reading for information 
and pleasure, and to seek or at least not avoid occupations 
requiring the exercise of their verbal and numerical abilities. 
The second group would tend generally to drop out of school as 
soon as possible, to avoid all unnecessary reading, and to seek 
occupations which do not involve any reading or arithmetic. 
The first group would maintain and in most cases improve in 
these abilities after reaching adulthood; the second would 
forget the skills learned in school and retrogress in their com- 
mand of such materials. 

In order to reconcile this theory with the data we must assume 
that the second group includes a considerable proportion of the 
total population, perhaps as much as one-fifth or one-fourth. 
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This estimate was derived by ‘reflecting’ the part of Figure II 
lying to the right of the mode, and then plotting the distribu- 
tion represented by the excess frequency to the left of the mode. 
This difference distribution had a mode at 71 and contained 
22.2 per cent of the total frequency. Most of the people in 
this group would not be classed as illiterate; the postulated 
retrogression is partial rather than complete. The bulge would 
also probably come lower in the distribution if one-third of 
the test did not consist of block-counting items, which meas- 
ure an ability that is presumably affected only slightly by formal 
schooling. 

Other factors such as direct inequalities of educational oppor- 
tunity and the error described in paragraph 5 of Appendix IV 
are undoubtedly responsible for part of this effect, but no alterna- 
tive explanation so far suggested seems sufficient to account for 
all of it. 

The basic statistical constants of the two distributions are: 


Distribution Fig. Mean SD Median 
Army standard scores I 97.3 24.0 99.8 
Adjusted Army standard scores II 97.3 24.2 97.9 


The exact location of the mode cannot be determined. The 
usual formula, Mode = Mean + 3 (Median — Mean), breaks 
down entirely, since the bulge in the distribution brings the 
median much closer to the mean than it would be otherwise. 
The apparent modes as shown in Figures I and II are not accu- 
rately determined either. Arbitrary changes in the relative 
heights of adjacent columns in this region, involving shifts of 
as little as one-tenth of one per cent of the total frequency, 
could change the apparent mode in either as much as three or 
four standard score points. 

None of the common frequency curves have one point of 
inflection on one side of the mode and two on the other, so any 
attempt at further mathematical analysis of the distribution 
would appear to be fruitless. 

There is an interesting discrepancy between the centile norms 
for the Army General Classification Test given in column (4) 
of Table II and the norms given in the manual for the Otis 
Group Intelligence Test. The Otis norms are generally con- 
sidered to have been derived very carefully. They were used 
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along with the norms for the original Stanford Achievement 
Test as the foundation of the Scaled Score system used with 
the Codperative Achievement Tests. The Otis manual gives 
the raw score 42 as the adult average on the Higher Examina- 
tion when applied with thirty-minute time limit. Referring 
to Table II we see that this corresponds to an Army standard 
score (and an adjusted Army standard score) of 120 or 121. 
This is ten or twelve standard score points above even the 
apparent mode of the Army distribution, which might reason- 
ably be taken as the average for the group who remain in school 
as long as possible and continue intellectual pursuits after they 
leave school. It is only two or three points below the ACE 
median of 1942 coliege freshmen and applicants. The con- 
clusion appears inescapable that the Otis adult norm is repre- 
sentative of those who actually continue their education, either 
formally or informally, throughout the period of late ado- 
lescence and on into early maturity. 

The fiftieth centile of the Army distribution corresponds to 
an Otis raw score of 28. According to the Otis manual this 
score is average for age 13-0. This is the age at which most 
children are completing or nearing the completion of their ele- 
mentary schooling. Almost all children are in school at this 
age, including those who will later form the ‘non-intellectual’ 
group postulated above. These latter should have just about 
reached the maximum of their functional verbal and arithmetical 
abilities, under the long-continued pressure of the school. Soon 
after this age they should begin to drop out of school and lose 
some part of their basic skills. The loss in arithmetic will 
probably be greater and more widespread than the loss in read- 
ing vocabulary. Mental growth will continue for several years, 
but at a lower rate than this forgetting. 

The average adult undoubtedly has considerably more mental 
capacity than he had at the age of thirteen. Intelligence tests do 
not measure native raental capacity. They measure actual per- 
formance on the test questions. A test is a fairly valid measure 
of the native capacities which underlie the abilities tapped by its 
questions when every one tested has had equal opportunities 
and equal incentives to develop the abilities measured. It has 
‘maximum validity of this type when every one tested has had 
maximum opportunity and maximum incentive over a consider- 
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able period of time extending right up to the date of testing. 
The verbal and arithmetical elements of a group intelligence 
test should, therefore, possess maximum validity as measures of 
the corresponding native capacities at the close of the period of 
elementary education. The decrease in such validity thereafter 
appears to be greater than has been generally supposed. 


VII. EVALUATION 


From the data given in Section V of this paper and from other 
studies not reported here, it may be concluded that in a full- 
range group of adults the Army General Classification Test has a 
comparable-forms reliability of about .95. In samples of this 
range its correlations with other group intelligence tests range 
from about .80 to .90, depending on the degree of similarity of 
the test materials. It covers a wider range of ability than most 
other group intelligence tests designed for use with adults. 
Only a few tests are easier to administer and score. Despite the 
fact that it contains only three types of questions, the range of 
different types of ability tapped is probably a little greater, and 
the balance among these abilities a little better, than in most 
previous group tests of intelligence. The norms given in this 
paper are probably reasonably representative of the young adult 
male population of America. 

Most of the items of Forms la and 1b are good items. A few 
are not. These forms have an excess of easy items, but they 
also have enough hard items to measure differences among the 
top one per cent of the population. They do not measure 
differences among the bottom three per cent well, because no 
group test of verbal intelligence is suitable for measuring the 
abilities of people in this range. 

The distribution of test-intelligence is not normal. The dis- 
tributions of the native capacities which underlie test-intelligence 
probably are. An explanation of this discrepancy has been 
advanced: namely, that about three-fourths to four-fifths of the 
population respond favorably to elementary school education 
and continue to improve their fundamental skills throughout 
adolescence and on into early maturity, while one-fourth to one- . 
fifth respond unfavorably and let these skills degenerate after 
completing the period of compulsory education. This explana- 
tion, if accepted, appears to pose important problems for public 
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education in coming years. A corollary, supported by additional 
data, is that group intelligence tests are considerably less valid 
as measures of the native intellectual capacities of adults than 
they are as measures of the native intellectual capacities of school 
children. 

Data have been gathered on the relations between scores on 
the Army General Classification Test and many other variables 
such as age, education, race, area of residence, occupation, grades 
in service school courses, and ratings of job proficiency. The 
analysis and collation of these data is as yet incomplete.* It is 
hoped that it can be completed and published in the not too 
distant future. 





APPENDIX I.—ESTIMATION OF DIFFICULTIES OF THE LAST FIVE 
ITEMS OF AGCT-1A IN THE MISCELLANEOUS FORTS SAMPLE 


Estimates of the difficulties of the last five items of each type 
(vocabulary, arithmetic, block counting) were made separately 
for the upper one hundred cases and the lower one hundred cases 
of the sample—a total of six groups. The following paragraphs 
describe the procedure of estimation in terms of one such group. 

1) The average number of omits was computed for the five 
items of each cycle except the first, which contained ten items, 
and the last, whose average was to be estimated. There were 
seven such averages. 

2) A parabola was fitted by least squares to the seven averages, 
and extrapolated one step to provide an estimate of the average 
number of omits in the last set of five items if everyone had 
finished the test. The slope of the curve at the point of this 
estimated average was computed also. 

3) An estimated trend value was obtained for each of the last 
five items. The estimated average from the fitted parabola was 
taken as the value of the middle item of the five. The other 
four trend values were taken as lying on a straight line passing 
through this average and having the same slope as that of the 
curve at this point. This procedure introduced no appreciable 
additional error, since the curvatures of the parabolic curves were 





* The first study involving some portion of these data has already been 
published. See Stewart, Naomi, “A.G.C.T. Scores of Army Personnel 
Grouped by Occupation.”’ Occupations, Oct. 1947. 
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not large, and the base-line distances between adjacent pairs of 
averages represented fifteen items (five of each type). 

4) A straight line was fitted by least squares to the actual 
number of omits for each of the last five items, to define the actual 
trend. The means of the actual trends in the six groups were all 
considerably above the estimated means. The differences were 
smallest for the vocabulary items (Numbers 136-140), and 
greatest for the block-counting items (Numbers 146-150), as 
would be expected on the assumption that such differences were 
due to progressively greater numbers of individuals failing to 
complete the later items within the time limit. 

5) The actual trend value for the number of omits for each of 
the last five items was subtracted from the number actually 
omitted to obtain the deviation from trend. These deviations 
(some positive and some negative) were added algebraically to 
the estimated trend values (paragraph 3) to provide estimates of 
the number of omits for each item if everyone had finished the 
test. 

6) For each item, the number actually omitted was subtracted 
from 100 to obtain the number actually attempted, and the esti- 
mated number of omits was subtracted from 100 to obtain an 
estimate of the number that would have been attempted if every- 
one had finished the test. The ratio of the number right to the 
number actually attempted was then multiplied by the estimated 
number of attempts to obtain the final estimate of difficulty—the 
number that would have been right if everyone had finished the 
test. 


APPENDIX II.—ADJUSTMENT OF ITEM-DIFFICULTIES OF THE ITEMS 
OF AGCT-1B TO MAKE THEM COMPARABLE TO THOSE OF THE 
ITEMS OF AGCT-l1A 


1) The difficulty values of the Form 1a arithmetic items in the 
combined ‘ Miscellaneous Forts’ and Jefferson Barracks samples 
were plotted against their difficulties when applied in the context 
of the Third Trial Form to the Fort Meade and Fort Myer 
sample. The correlation was .97, and the means and standard 
deviations were as follows: 


Sample Variable Mean SD 
Misc. Forts and Jeff. Bar. xX 76.50 19.55 
Forts Meade and Myer Y 67.30 24.07 





ae 
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2) We will assume, for the vocabulary items as well as for the 
arithmetic items of Form 1b, that if they had been applied to 
the combined ‘Miscellaneous Forts’ and Jefferson Barracks 
sample as well as to the Fort Meade and Fort Meyer sample, 
the ratio of the standard deviations of their difficulty values in 
the two samples would have been the same as the ratio found 
for the Form la arithmetic items. This ratio is very close to 
81 (actually .8122). We will also assume that the difference 
between the means of these difficulty values in the two samples, 
expressed in units of their standard deviations in the Fort Meade 
and Fort Myer sample, would have the same value as the 
one found for the Form la arithmetic items. This value, 
(Mx — My)/SDy, is quite close to .38 (actually .3822). 

3) The equation for transforming a set of scores Y having a 
given mean and standard deviation into an equivalent set of 
scores X having some other arbitrary mean and standard devia- 
tion may be written, 


SDz SDz 


We wish to use this equation to transform the difficulty values 
of the Form 1b vocabulary and arithmetic items, as determined 
from the Fort Meade and Fort Myer sample, to approximately 
equivalent values for the combined ‘Miscellaneous Forts’ and 
Jefferson Barracks sample. 

By the first assumption of paragraph 2 we may use the value 


X=Y 


.81 for the ratio ae By the second assumption we may replace 


Mz by .38 SDy + My. Substituting these values in the equa- 
tion we have, 


X = .81Y + .19My + .38SDy 


The values of My and SDy were obtained by direct computation 
from the Fort Meade and Fort Myer data. 


APPENDIX III.—COMPUTATION OF INDICES OF DISCRIMINATION OF 
THE ITEMS OF AGCT-1A 


1) The ‘Miscellaneous Forts’ sample of 200 was divided into 
upper and lower groups of 100 each. The tetrachoric correla- 
tion of each item with total score, as represented by membership 
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n one or the other of these groups, was computed. Each cor- 
relation was based on all 200 cases. For the last five items of 
each type, the total numbers of correct responses in upper and 
lower groups as estimated in Appendix I were used. 

2) From the complete Jefferson Barracks sample of 712, 
counts of correct responses were made only for the top and bottom 
100 cases. If we assume that the index of difficulty of each 
item is reasonably well represented by the average number of 
correct responses in these two extreme groups, we have the 
situation represented in the following table: 





Top Bottom 
100 Middle 512 100 Total 
Number right a undetermined b 7.12(a + b)/2 


Number not right c undetermined d 7.12(¢ + d)/2 





Total 100 512 100 712 


From this table it is possible to derive two estimates of the 
tetrachoric correlation, one based on a and c, and one on b and 
d. The four-fold tables, with marginal totals reduced to per- 
centages, are as follows: 


Top 100 Bottom 612 Total 








Per cent right a/712 undetermined (a + b)/2 
Per cent not right c/712 undetermined (c + d)/2 
Total per cent 14.0 86.0 100 


Top 612 Bottom 100 Total 








Per cent right undetermined 06/712 (a + b)/2 
Per cent not right undetermined 4/712 (c + d)/2 
Total per cent 86.0 14.0 100 


The tetrachoric correlation computed from each of these tables 
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will be based on all 712 cases in every respect except the row 
total, or estimate of item difficulty, (a + 6)/2. 

3) Since difficulty estimates are in general more reliable than 
estimates of tetrachoric correlation, it is believed that the correla- 
tions based on the 712 cases of the Jefferson Barracks sample 
should be more reliable than those based on the 200 cases of the 
‘Miscellaneous Forts’ sample, despite the fact that the difficulty 
estimates entering into the former are based on only the 200 
cases in the top and bottom groups. As a working compromise, 
therefore, the two estimates of the correlation in the Jefferson 
Barracks sample were averaged with the one estimate of the 
correlation in the ‘Miscellaneous Forts’ sample to obtain the 
final estimate of the discrimination of each item of Form la. 

4) The tetrachoric correlations for the Jefferson Barracks 
sample were computed from the Chesire-Saffir-Thurstone com- 
puting diagrams; all others were computed from a special chart 
for median-split tetrachoric correlations prepared by M. W. 
Richardson. 


APPENDIX IV.—DERIVATION OF THE AGCT CENTILE NORMS FOR 
THE WORLD WAR II ARMY 


1) From November 1940, when Form la was first put into 
general use, until the end of August 1941, all standard scores 
were reported to The Adjutant General. Beginning inSeptember 
1941, only Army Grades were reported for new inductees and 
selectees. Forms lc and ld were introduced in November 1941, 
and it was estimated that by January 1942, more men were 
being tested with these forms than with Forms la and lb. The 
lower limit of Army Grade IV was lowered from 70 to 60 on 
July 1, 1942. Consolidations of the Army Grade distributions, 
which were reported monthly, were therefore made for three 
periods: November 1940 through December 1941, January 1942 
through June 1942, and July 1942 through August 1945 (the 
month in which the Japanese finally surrendered). 

2) Referring to the data presented in Columns (1) and (2) of 
Table II, a determination was made of the ranges of original 
standard scores corresponding to the Army Grade ranges defined 
by the later standard scores, with the lower limit of Grade IV 
set at 60. These ranges were as follows: 
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Army Grade as Defined by Range of Original 
Later Standard Scores Standard Scores 
I 128 and higher 
II 111-127 
III 92-110 
IV 55-91 
V 54 and lower 


Army Grades based on Forms la and 1b were acutally reported 
in terms of the following ranges of original standard scores: 


Army Grade as Defined by Range of Original 
Original Standard Scores Standard Scores 
I 130 and higher 
II 110-129 
III 90-109 
IV 70-89 
Vv 69 and lower 


Referring to the actual distribution of 644,065 original standard 
scores reported from November 1940 through August 1941, the 
adjustments indicated in the following tabulation were derived. 


Range of Original Army Percentage To be Moved 


Original Grade in of Frequency into 
Standard Scores which Found inthat Grade Army Grade 
55-69 V 61.6 IV 
90-91 III 8.0 IV 

110 II 5.2 III 
126-129 IT 6.8 I 


The original standard scores 67-69 correspond to later standard 
scores 70 and 71, and the original standard scores 55-66 corre- 
spond to later standard scores 60-69. All of these scores were 
therefore moved into Grade IV. 

The adjustments shown in the table above were applied to 
the 962,399 cases reported by original Army Grade from Novem- 
ber 1940 through December 1941, the period during which most 
of the testing was done with Forms la and 1b. The 962,399 
cases for this period include the 644,065 on which the adjust- 
ments were determined. 








408 The Journal of Educational Psychology 


3) The second consolidation of Army Grade distributions 
covers the period from January 1942—by which time it was 
estimated that more than half the tests given were Forms lc 
and 1d—through June 1942—the last month in which the lower 
limit of Grade IV was standard score 70—and includes a total 
of 1,284,471 scores. In order to check on the standardization 
of Forms lc and 1d, approximately 500,000 standard scores had 
been collected, covering the period from July through September, 
1942. A subsample of 41,498 cases was drawn in such a manner 
as to match approximately the complete distribution of Army 
Grades by Service Commands for the period from October 
through December, 1942. From the standard score distribu- 
tion of this subsample it was found that 42.7 per cent of the 
standard scores below 70 were in the range 60-69. The fre- 
quencies of the second consolidation were therefore adjusted by 
transferring 42.7 per cent of the cases in Grade V to Grade IV. 

4) It should be noted again that Forms le and 1d were intro- 
duced in October 1941, and that Forms la and 1b were still 
in use to some extent in June 1942. The estimate that the fre- 
quency of use of Forms lc and 1d passed that of Forms la and 
lb on January 1, 1942 is rather rough. The adjustments 
appropriate to scores on Forms la and 1b as described in para- 
graph 2, above, were applied to some considerable number of 
Form lc and ld scores. The adjustments appropriate to Forms 
lc and 1d as described in paragraph 3 were applied to a roughly 
equal number of Form la and 1b scores. These errors should 
compensate each other in part, but not entirely. Some few 
copies of Forms la and 1b were used after July 1, 1942, and the 
corrections described in paragraph 2 were not applied to them, 
though they were reported with the lower limit of Army Grade 
IV at the original standard score value of 60 rather than 70. It 
is highly improbable that all the uncorrected errors of the adjust- 
ment procedures described in paragraphs 2 and 3 could bias any 
of the final centile values by as much as one point, however. | 

5) The third consolidation of Army Grade data covered the 
period from July 1942 through August 1945, and included a total 
of 6,736,076 cases. No adjustments were made to the fre- 
quencies in the five Army Grades for this group; the adjustments 
to the frequencies of the first and second consolidations were 
designed to make them comparable to those of the third. From 
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August 1943 through August 1945, however, men failing certain 
preliminary tests administered at Induction Stations were sent 
directly to Special Training Units. Others who passed these 
tests but were later found to be illiterate after taking the Army 
General Classification Test at Reception Centers were also sent 
to the Special Training Units. These units provided short 
intensive courses designed to bring the men up to minimum 
literacy standards. At the conclusion of special training all of 
the men were given the AGCT, including those who had already 
taken it at Reception Centers. Since it was impossible to 
separate the post-training test scores of those who had previously 
taken the AGCT from those who had not, and since the majority 
of trainees fell in the latter category, the post-training figures 
were used throughout. Insofar as special training improved the 
abilities measured by AGCT, and insofar as practice effect fur- 
ther improved the scores of those who had taken the test pre- 
viously, a bias was introduced by this procedure whose principal 
effect would be some slight augmentation of the frequencies in 
Army Grade IV at the expense of those in Army Grade V. The 
total number of special trainees included among the 6,736,076 
was 267,310, of whom 90,177 had taken the AGCT previously. 
The possible alternatives to the use of post-training scores—the 
omission of the scores of those sent directly to the Special Train- 
ing Units, or the assumption that the distribution of their scores 
would have approximated that of the men who passed the Induc- 
tion Station tests and were later referred to special training 
from the Reception Centers—appeared likely to introduce worse 
biases. 

6) It was estimated roughly that 356,390 of the officers com- 
missioned into the World War II Army were not examined with 
the Army General Classification Test. It was assumed that 
the distributions of their standard scores would have been roughly 
normal, with mean and standard deviation comparable to those 
of enrollees in early officer candidate schools (before all such 
enrollees were required to obtain Army standard scores of 110 
or over). For such groups M = 124 and SD = 11, approxi- 
mately. By reference to a table of the normal distribution, the 
numbers of these who would have fallen in Army Grades I, II, 
and III were estimated and added to the totals of the three 
consolidations of actual data. No attempt was made to esti- 
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mate the distribution of Army Grades of the officers and enlisted 
men who were already in the Army on November 1, 1940, since 
the most reasonable assumption that could have been made was 
that this distribution would approximate that of the men who 
came in after that date. ; 

7) The adjusted Army Grade distribution of the total group 
was as follows: 








Army Grade Number Per cent 
I 652,777 6.99 

II 2,566,361 : 27.48 

III 2,800,133 29.98 

IV 2,655,991 28.44 

V 644,024 7.11 

Total 9,339,286 100.00 


Of this total, 858,567, or 9.2 per cent, were colored; the rest were 
white. In order to obtain centile norms, it was necessary to 
estimate the Army standard score distribution corresponding to 
this distribution of Army Grades. 

8) Cumulative percentages derived from the table in para- 
graph 7, above, were plotted on cross-section paper over the 
standard score points 59.5, 89.5, 109.5, and 129.5. A crude 
ogive was drawn through these points with the use of a ship 
curve. This ogive was extended upward to meet the 100 per 
cent line tangentially at standard score 163, which corresponds 
to the perfect raw score on the hardest forms of the test. Though 
the standard score 39 corresponds to the zero raw score on the 
easiest form of the test (Form 1a), the ogive was extended down- 
ward to meet the 0 per cent line tangentially at standard score 
20, in order to avoid the pile-up effect at and just above raw 
score zero. The cumulative percentages were read from the 
ogive at five-point standard score intervals, and from these the 
simple frequencies (in percentages) were obtained by successive 
subtraction of each cumulative percentage from the one above 
it. This gave a rough frequency distribution, with exactly cor- 
rect proportions of the total frequency in each Army Grade. 

9) The rough frequency distribution was plotted in pencil on 
one-tenth inch cross-section paper, with one-tenth inch on the 
horizontal scale representing one standard score unit, and one- 
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tenth inch on the vertical scale representing one-tenth of one per 
cent of the total frequency. Actual frequency distributions were 
plotted on the same score scale (in five-point intervals) and the 
same percentage frequency scale for the following samples: 

(a) The 644,065 Form la and 1b scores covering the period 
from November 1940 through August 1941, in original standard 
scores. 

(b) The Form 1c and 1d scores of the 1,782 men comprising 
the standardization group for these forms (a total of 3,564 
scores), in raw scores. 

(c) The 41,498 Form lc and ld scores used in determining 
the adjustment to the second consolidation, in Army standard 
scores. 

(d) A set of 110,859 Army standard scores (Form of test not 
identified) obtained from two cross-sectional samples of the 
Army taken on March 31, 1944 (89,337 cases) and September 30, 
1944 (21,622 cases). Most of the tests in these samples were 
probably Forms lc and 1d. 

Successive adjustments were then made to the rough fre 
quencies of the total distribution on a ‘cut-and-try’ basis by 
lowering some columns and raising others in the same Army 
Grade range. The following criteria guided the making of these 
adjustments: 

(a) The total frequency in each Army Grade range must 
remain constant. 

(b) The distribution curve must pass smoothly through the 
lines separating each Army Grade range from the adjacent ones. 

(c) The total distribution must resemble the four sample dis- 
tributions as closely as possible. 

Under the restrictions of the first two criteria there was prac- 
tically no leeway in the application of the third. Repeated 
trials indicated that no distribution which meets the first two 
criteria exactly and is not quite obviously dissimilar in general 
appearance to the sample distributions can yield a set of centiles 
any one of which differs by more than two points from those 
given in Column (4) of Table II. The only cases in which two- 
point differences were obtained were those in which an actual 
difference of very little more than one point caused a reversal in 
the direction of the rounding-off error. 

10) The purpose of a test is to measure the corresponding 
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abilities of those to whom it is applied. The pile-up of fre- 
quencies in the AGCT samples in the region at and just above 
raw score zero indicates that this test does not measure 
the bottom levels of the ability which it measures throughout the 
rest of the range. The absence of any such pile-up near the 
perfect score indicates that it does measure the top levels of this 
ability fairly well. The distribution curve of any ability which 
is measured adequately throughout its range should approach 
the base-line asymptotically, or at least tangentially, at each 
end. Astudy of the sample curves showed that if the frequencies 
actually recorded in the region of standard scores 39 through 49 
were distributed smoothly over the region of standard scores 20 
through 39, these curves would approach the base-line tangen- 
tially at standard score 10. The ability measured by the test 
is about equally infrequent at levels above standard score 160 
and below standard score 20. Approximatly two persons in 
35,000 fall outside this range, one at each end. The test does 
not measure the ability accurately at standard score levels below 
50; it does not measure it at all at standard score levels below 40. 

11) Since it is the distribution of the ability measured rather 
than the distribution of the scores themselves which is of pri- 
mary interest, the scores in Army Grade V were distributed 
over the region from standard scores 20 through 59 in preparing 
the distribution from which the centile norms were derived. 
The mean of this distribution is 97.3 and the standard deviation 
is 24.0. If the distribution had been cut off at standard score 
39, corresponding to raw score 0, the mean would have been 
slightly higher and the standard deviation slightly lower. The 
median, which is not affected by this procedure, is 99.8. It may 
be noted in Table (2) that the 50th centile is given as 99 rather 
than as 100. A centile is defined here as a score—the score at or 
below which the given percentage of individuals lie. A per- 
centile is a point—the point below which the given percentage 
of individuals lie. In the present instance, fifty per cent of the 
group have abilities represented by scores below 99.8, and 
approximately fifty per cent obtain standard scores below 100; 
i.e. standard scores of 99 or lower. 

12) Forms lc and 1d, as has been noted previously, were 
released in November 1941, and it was estimated that by Jan- 
uary 1942 their use had exceeded that of Forms la and Ib. 
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Form 3a was introduced in April 1945, and by May 1 it had almost 
entirely superseded Forms lc and 1d for initial testing. Form 
3b was not issued until some time after the end of the war. 
According to the best available estimates, the test forms on 
which these centile norms are based were used in about the 
following numbers: 


Form Number Per cent 
la and 1b 962,339 10.3 
lc and ld 8,029,256 86.0 

3a 347,631 3.7 





Total 9,339,286 100.0 
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TaBLE I.—INpicEs oF Dirricutty (PER CENT CORRECT) AND 


DISCRIMINATION (TETRACHORIC CORRELATION WITH TOTAL 
ScoRE) OF THE ITEMS OF THE ARMY GENERAL CLASSI- 
FICATION TEst, Forms la AND lb 






























































Vocab-la Arith-la Block Counting-la & 1b Vocab-1b Arith-1b 
Item selry;.| Item Insel n; Item Item selry:.| Item Insel ;.|/ Item [n;:elyn; 
No. Dif/Dis No. Dif|Dis No.-la | No.-1b Dif|Dis No. Dif|Dis No. Dif|Dis 

1 11 | 96).46 21 18 ~v|.49 1 | 91/.85 9 | 99|.70 
2 — 12 | 95).38 22 17 98). 2 | 90).89 10 | 98/.40 
3 96). 60 13 | 98).59 23 19 96). 50 3 | 89/.82 11 | 97|.35 
4 96]. 48 14 | 93/.38 24 20 96/.39 4 | 89).87 12 | 96/.80 
5 95]. 60 15 | 96). 57 25 23 94/.44 5 | 87/.89 13 | 94/.35 
6 96/.60 16 | 94/.50 26 21 93}. 42 6 | 86/.83 14 | 92/.46 
7 92/.55 17 | 88}.30 27 41 83). 57 7 | 84|.79 15 | 91|.75 
s 95]. 46 18 | 91}.55 La 22 92). 46 8 | 84/.82 16 | 91/.82 
9 96). 41 19 | 95)}.50 29 24 90/.40 25 | 82}.85 31 | 90].52 

10 90}. 54 20 | 95/.50 30 60 82). 54 26 | 80/.82 32 | 90|.70 

31 87). 50 36 | 96/.48 41 42 80). 64 27 | 80/.68 33 | 90}.70 

32 91/.48 37 | 89).50 42 38 84/ .66 28 | 79|.70 34 | 90).83 

33 92]. 46 38 | 89).33 43 59 80) . 66 29 | 72|.83 35 | 88.67 

34 90}. 48 39 | 92).48 44 39 86). 53 30 | 71|.72 36 | 87|.68 

35 92). 51 40 | 92).44 45 147 31}.25 43 | 71|.76 49 | 86|.77 

46 87]. 47 51 | 91/.53 56 109 66/.41 44 | 68/.68 50 | 86|.77 

47 87} .60 52 | 90/.60 57 75 75|.72 45 | 68/.70 51 | 85/.63 

48 82]. 52 53 | 87|.55 58 55 83) .64 46 | 68|.78 52 | 85|.75 

49 84). 55 54 | 96/.60 59 76 83/.21 47 | 67|.77 53 | 85).75 

50 74| .47 55 | 86/.57 60 56 81). 58 48 | 65|.74 54 | 84/.82 

61 83) .32 66 | 96).61 71 77 79) .64 61 | 64/.83 67 | 83).74 

62 67). 64 67 | 87|.51 72 93 70}. 52 62 | 64/.63 68 | 82).85 

63 85]. 56 68 | 84/.63 73 110 70] .47 63 | 62|.64 69 | 81|.78 

64 83/.38 69 | 85).56 74 40 80). 53 64 | 60/.48 70 | 78|.76 

65 78].61 70 | 88}.53 75 92. | 74|.51 65 | 59/|.48 71 | 78).67 

76 75|.30 81 | 81}.58 86 37 90}. 45 66 | 59/.47 72 | 78|.83 

77 70} .63 82 | 84/).54 87 94 73).51 79 | 58|.73 85 | 77|.78 

78 71).47 83 | 82/.56 88 78 75|.68 80 | 57|.44 86 | 76|.87 

79 73|.65 84 | 82/.62 89 91 71/.65 81 | 56).53 87 | 76|.68 

80 56}. 53 85 | 75|.44 90 114 58) .37 82 | 56).59 88 | 74/|.56 

91 50). 44 96 | 74|.60 101 74 81/).36 83 | 55|.67 89 | 72|.68 

92 51/.38 97 | 73|.64 102 95 74| .34 84 | 55).42 90 | 71|.78 

93 62]. 52 98 | 72|.64 103 58 79). 64 97 | 54/.49| 103 | 70/|.71 

94 44) .25 99 | 63/.55 104 113 72|.24 98 | 53/.28; 104 | 69/.89 

95 46/.52) 100 | 68/.52 105 112 64/.44 99 | 53/.67/ 105 | 65|.73 
106 22/;.16] 111 | 60}.72 116 148 30/.31} 100 | 52|.43) 106 | 64/.68 
107 57|.61] 112 | 42}.53 117 1ll 68/.43} 101 | 51|.42) 107 | 64/.67 
108 70|.69| 113 | 52/.59 118 128 53/}.59] 102 | 51/.63] 108 | 59|.75 
109 45|.42) 114 | 72/.65 119 57 78|.42) 115 | 51/).52| 121 | 59/.64 
110 37|.22} 115 | 50).66 120 149 41/.14; 116 | 50/.34; 122 | 59|.77 
121 28/.26] 126 | 67|.46 131 146 30/.32} 117 | 48).58| 123 | 54.74 
122 20}.24| 127 | 62|.73 132 127 48/.51] 118 | 48|.47| 124 | 54/.88 
123 39].25) 128 | 66).71 133 73 781.39] 119 | 47/.63| 125 | 50/.54 
124 34/.53} 129 | 31}.53 134 145 35/].45) 120 | 44).58) 126 | 45/.59 
125 6|.02; 130 | 32|.57 135 130 60/.25] 133 | 44/.37] 139 | 45/.67 
136 24/).30| 141 | 67/.54 146 129 44/.51| 134 | 43/.59} 140 | 41].34 
137 12}.18] 142 | 65|.72 147 96 57|.38] 135 | 41).29] 141 | 40/.48 
138 28/.38] 143 | 34/.39 148 131 44/.56] 136 | 40}.35| 142 | 37/.38 
139 36}.38} 144 | 38/).55 149 150 22/.35| 137 | 28].21) 143 | 36).37 
140 25/.40} 145 | 38/.47 150 132 29/.36] 138 | 26].21] 144 | 30].20 
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1 2 3 4 
AGCT- 

la&lb Ad- 

Origi- justed 
Army nal Army 
Stand- Stand- Stand- Army 

ard ard ard Cen- 
Score Score Score tile 
160 157 172 . 
159 156 170 ° 
158 154 168 ° 
157 =:153 166 ° 
156 152 164 ° 
155 151 163 ° 
154 + 150 161 ° 
153 +=—-:149 159 ° 
152 148 157 ° 
151 147 155 . 
150 146 154 ° 
149 145 152 ° 
148 144 151 99 
147 143 150 99 
146 142 149 99 
145 141 148 99 
144 + 140 147 99 
143 140 146 99 
142 139 145 99 
141 138 144 98 
140 § 1387 143 98 
139 136 142 98 
1388 135 141 98 
137 = 134 140 97 
136 §=6.133 139 97 
135 132 138 96 
134 §=6181 137 96 
133 +=130 136 95 
132 130 135 95 
131 129 134 94 
130 128 133 O4 
129 127 131 93 
128 §=126 130 92 


* above 99.5 
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TABLE II.—Army GENERAL CLASSIFICATION TEST 
ArMY STANDARD ScorREs, CENTILE NORMS AND EQUIVALENT 
ScorREs ON CERTAIN OTHER TESTS 
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AGCT- 
la 
Raw 
Score 


145 
144 
142 
140 
139 
138 
136 
135 
134 
133 
132 
130 
129 
128 
127 
126 
125 
124 
123 
122 
121 
119 
118 
117 
116 
115 
114 
113 
112 
111 
110 
109 
108 
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AGCT- 
lb 
Raw 
Score 


148 
146 
145 
143 
142 
141 
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138 
137 
136 
135 
133 
132 
131 
129 
128 
127 
126 
125 
124 
122 
121 
120 
118 
117 
116 
115 
114 
113 
112 
111 
110 
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Wells 


Higher Five 
Raw Raw Raw 


Score Score Score 


74 
72 
71 
69 
68 
67 
66 
65 


62 
61 


58 
57 
56 
55 


53 
52 
51 


49 
48 


199 
198 
198 
197 
196 
195 
194 
193 
192 
191 
189 
188 
186 
185 
183 
182 
180 
178 
176 
174 
172 
170 
168 
166 


1942 


167 
166 
165 
164 
163 
162 
161 
159 
158 
157 
155 
154 
153 
151 
150 
148 
147 
145 
143 
142 
140 
138 
136 
135 
133 
131 
129 
127 
125 
123 
121 
119 
117 
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1 2 3 


AGCT- 
la&lb Ad- 
Origi- justed 
Army nal Army 


aa 


ard ard 
Score Score Score 


| 127 126 = 129 


126 125 127 


4 125 124 126 


ce 124 123 124 
1 123 122 123 
1 122 121 122 

121 120 121 
+ $' 120 120 120 
bh 119 119 119 
. 118 118 #117 
117 117 = #116 
116 116 115 

115 115 114 
é 114 114 = 113 
; 113 113~=«s«112 
; 1122 112 111 


lll = iil 110 
110_~—s Ill 109 
109 += 110 108 
108 109 106 
107 108 105 
106 107 104 
105 106 103 
104 105 102 
103-104 101 
102 +=103 100 
101 103 99 
100 102 98 
99 101 97 
98 100 96 
97 99 95 
96 98 94 
95 97 93 
94 96 92 


aa Se, = 





4 


Stand- Stand- Stand- Army 
ard Cen- 


tile 
91 
90 
89 
88 
87 
85 
84 
83 
81 
80 
78 
77 
75 
74 
72 
70 
69 
67 
65 
64 
62 
61 
59 
57 
56 
54 
53 
51 
50 
48 
47 
45 
44 
42 


5 


AGCT- 


la 
Raw 


Score 


107 
106 
105 
103 
102 
101 
100 
99 
98 
97 
96 
95 
94 
92 
91 
90 
89 
88 
87 
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AGCT- 


ib 


Raw 
Score 


109 
108 
107 
105 
104 
103 
102 
101 
100 
99 
97 
96 
95 
93 
92 
91 
90 
89 
88 
87 
86 
84 
83 
82 
81 
80 
79 
78 
76 
75 
74 
73 
71 
70 
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TaBLE IIT.—Army GENERAL CLASSIFICATION TEST 
ARMY STANDDARD Scores, CENTILE NoRMS AND EQUIVALENT 
Scores ON CERTAIN OTHER TEsTs—(Continued) 
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SA Alpha ACE 
Higher Five 1942 
Raw Raw Raw 
Score Score Score 


48 
47 
46 
45 
44 
43 
42 
42 
41 
41 
40 
39 
38 
37 
36 
36 
35 
35 
34 
34 
33 
32 
32 
31 
30 
30 
29 
29 
28 
28 
27 
26 
26 
25 


163 
161 
159 
157 
154 
152 
150 
148 
145 
143 
141 
138 
136 
134 
132 
130 
127 
125 
122 
120 
117 
115 
113 
111 
109 
106 
104 
102 
100 

97 

95 

93 

91 

89 


115 
113 
111 
109 
107 
105 
103 
101 
99 
97 
95 
93 
91 
89 
87 
85 
84 
82 
80 
78 
76 
74 
73 
71 
69 
67 
66 
64 
63 
61 
59 
58 
57 
55 
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ACE 
1942 
Col- 
lege 
Cen- 
tile 
69 
66 
63 
60 
56 
53 
49 
46 
43 
40 
37 
34 
31 
28 
26 
23 
22 
20 
18 
16 
14 
13 


—_ 
i) 


mRNA W510 00 = 





1 2 3 4 
AGCT- 
la&lb Ad- 
Origi- justed 
Army nal Army 
Stand- Stand- Stand- Army 
ard ard ard Cen- 
Score Score Score tile 
93 95 91 41 
92 94 90 40 
91 93 89 38 
90 92 88 37 
89 91 87 35 
88 89 86 34 
87 88 85 33 
86 87 84 32 
85 86 83 31 
84 85 82 29 
83 84 82 28 
82 83 81 27 
81 82 80 26 
80 80 80 25 
79 79 79 24 
78 78 78 23 
77 77 77 22 
76 75 76 21 
75 73 75 20 
74 72 74 19 
73 71 73 18 
72 70 72 17 
71 69 72 16 
70 67 71 15 
69 66 70 14 
68 65 69 13 
67 63 68 13 
66 62 67 12 
65 61 66 11 
64 60 65 10 
63 58 64 10 
62 57 64 i) 
61 56 63 8 
60 55 62 8 
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TaBLE IJ].—Army GENERAL CLASSIFICATION TEST 
Army STANDARD Scores, CENTILE NORMS AND EQUIVALENT 
Scores ON CERTAIN OTHER TEsts—(Continued) 
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AGCT- 
la 
Raw 
Score 


69 
68 
66 
65 
64 
62 
61 
59 
58 
57 
55 
54 
53 
51 
49 
48 
46 
44 
42 
41 
39 
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27 
26 
24 
23 
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AGCT- 
lb 
Raw 
Score 


69 
67 
66 
64 
63 
61 
59 
58 
57 
55 
54 
53 
51 
49 
47 
46 
44 
42 
40 
38 
36 
35 
33 
31 
30 
29 
27 
25 
24 
22 
20 
19 
17 
16 
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Otis Wells 
SA Alpha ACE 
Higher Five 
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1942 


Raw Raw Raw 


Score Score Score 


25 
24 
23 
23 
22 
21 
21 
20 
20 
19 
18 
18 
17 
16 
16 
15 
15 
14 
13 
12 
11 
il 
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87 
85 
83 
81 
79 
77 
75 
73 
71 
69 
67 
66 
64 
62 
60 
58 
56 
54 
52 
51 
49 
47 
45 
43 
41 
40 
38 
36 
34 
33 
31 
29 
28 
26 


54 
52 
51 
50 
49 
47 
46 
45 
44 
42 
41 
40 
39 
38 
37 
36 
34 
33 
32 
31 
30 
29 
28 
27 
26 
24 
23 
22 
21 
20 
19 
18 
17 
16 
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ACE 
1942 
Col- 
lege 


Cen- 
tile 
4 
3 
3 
2 
2 
2 
2 
2 
2 
1 
1 
1 
1 
1 
1 
1 
1 
1 
t 
t 
t 
t 
t 
tT 
t 
t 
t 
t 
t 
tT 
t 
t 
t 
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ARMY STANDARD Scores, CENTILE Norms, AND EQUIVALENT 
ScoREs ON CERTAIN OTHER TEsts—(Continued) 


1 2 3 4 
AGCT- 
la&lb Ad- 
Origi- justed 
Army nal Army 
Stand- Stand- Stand- Army 
ard ard ard Cen- 
Score Score Score tile 
59 54 61 7 
58 53 60 7 
57 52 59 6 
56 51 58 6 
55 50 58 5 
54 49 57 5 
53 48 57 4 
52 47 56 4 
51 47 56 4 
50 46 55 3 
49 45 55 3 
48 45 55 3 
47 44 54 3 
46 44 54 2 
45 43 53 2 
44 43 53 2 
43 42 52 2 
42 42 52 2 
41 41 51 1 
40 41 51 1 


t below 0.5 
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AGCT- 
la 
Raw 
Score 


19 
18 
16 
15 
14 
13 
ll 
10 
10 


ET oe 
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AGCT- 
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Raw 
Score 


15 
13 
12 
10 
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Score Score Score tile 
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THE RELATION BETWEEN SCORES OBTAINED 
BY HARVARD FRESHMEN 
ON THE KUDER PREFERENCE RECORD 
AND THEIR FIELDS OF CONCENTRATION 


ANDREW R. BAGGALEY 
Office of Tests, Harvard University 


In the early part of the student’s college career he is confronted 
by the necessity of selecting a field of concentration. The prob- 
lem of the present study was to discover whether the Kuder 
Preference Record can be of any help in this connection. Inas- 
much as it has been widely used as a device for the determination 
of occupational interests, it seemed reasonable to suppose that it 
might also serve the related purpose of detecting a student’s 
academic interests. In the Manual of the test similar investiga- 
tions by Kuder, Mangold, and Yum are cited. 

For the present study the Kuder Preference Record data were 
available on one hundred eighty-five-Harvard College students 
atthe close of their freshman year. Also available were the 
choices of fields of concentration these men are required to make 
before beginning their sophomore year. Table 1 shows an analy- 
sis of the scores for each field group. 

These results were compared with those of two studies of 
groups of students enrolled in different divisions of the University 
of Chicago. The results of the present study agree with those of 
Yum’s‘ in that students in the physical and biological sciences 
tend to score higher on the scientific scale than social science 
division students and the latter score higher on the literary scale. 
However, the biology concentrators are somewhat lower than the 
physical science men on the scientific scale. The results are also 
in accord with those of another study! in that the physical science 
group is high on the scientific and computational scales, the 
economics concentrators (as compared with those taking the 
business course at Chicago) are high on the persuasive and 
computational scales, and the humanities group is high on the 
artistic, literary, and musical scales. 

A subjective analysis of the Harvard data was made, and on 
the basis of the preference scores the fields seemed to fall into 
two groups (the data of the four fields with less than three 
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cases apiece were eliminated): Group A—anthropology, biology 
chemistry, engineering science and applied physics, mathematics, 
and physics; Group B—architecture, economics, English, fine 
arts, geology, government, history, history and literature, philoso- 
phy, Romance languages, Slavic languages, and social relations. 
Group A corresponds to the natural science area, and Group B 


TABLE 1.—FREQUENCY AND ScaLE MEANS FOR EAcH 


OF CONCENTRATION* 


Field of Concentration 


Anthropology 

Architecture 

Biochemistry 

Biology 

Chemistry 

Classics 

Economics 

Engineering Science 

Engineering Science and 
Applied Physics 

English 

Fine Arts 

Geology 

Government 

History 

History and Literature 

Mathematics 

Philosophy 

Physical Science 

Physics 

Psychology 

Romance Languages 

Slavic Languages 

Social Relations 

Group A 


Group B 


n12383465 6 7 
3 74 28 72 62 43 49 19 
5 86 29 61 59 85 58 15 
7 76 35 86 62 40 57 20 
8 68 26 76 68 56 51 22 
3 82 32 83 65 60 49 22 
1 28 18 59 44 38 56 16 

30 60 39 58 90 42 60 20 
2 75 29 60 73 34 45 24 

11 82 39 79 72 45 54 21 

15 50 18 47 74 55 79 27 
3 80 26 42 71 76 49 25 
4 65 34 61 74 47 60 15 

39 55 30 52 82 44 63 21 
9 50 31 55 77 46 68 24 
5 49 27 45 68 62 82 33 
4 78 42 82 71 45 56 22 
3 55 33 67 60 56 69 21 
1 80 60 78 83 30 48 18 
8 88 39 89 57 48 50 22 
1 54 16 96 62 54 30 27 
4 45 26 53 52 57 48 19 
3 35 27 56 63 64 75 33 

16 58 30 62 73 46 56 24 

44 78 35 81 66 48 52 21 


136 57 31 55 78 49 63 22 


FIELD 


8 9 


90 42 
49 44 
76 37 
75 34 
64 32 
26 27 
65 57 
71 54 


59 44 
63 50 
58 52 
72 48 
73 55 
74 52 
49 54 
62 41 
60 54 
63 59 
61 43 
103 29 
63 58 
70 48 
82 49 
68 46 


68 53 


* Key for the Scales: 1. Mechanical. 2. Computational. 3. Scientific. 


4. Persuasive. 5. Artistic. 
9. Clerical. 


6. Literary. 7. Musical. 8. Social Service. 
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corresponds to the social studies and humanities areas with one 
significant exception: geology falls into Group B. 

A formula for computing Fisher’s discriminant function on the 
basis of these two groupings was derived. The discriminant 
function is a method of determining a set of weights for the linear 
combination of a number of measures in order that a maximum 
separation between two groups of cases may be made on the basis 
of the compound measurements thus obtained. Travers* used 
it to separate twenty successful engineer apprentices from twenty 
successful air pilots on the basis of six tests. Selover? used it to 
separate pairs of fields of concentration on the basis of the sopho- 
more testing program given to University of Minnesota students. 
The weights which best separate the two groups of this study are 
given in Table 2. 


TABLE 2.—ScaLE WEIGHTS FOR COMPUTING DISCRIMINANT 


FUNCTION 
Scale Weight Scale Weight 
Mechanical 13 ~_—s— Literary —7 
Computational 22 Musical 18 
Scientific 58 Social Service 0 
Persuasive —2 Clerical —31 
Artistic 2 


When the weights are compared with the differences between 
the scale means of the two groups, it will be noted that the 
weights for the scientific and clerical scales are large as expected. 
However, some of the weights are not such as would be expected, 
notably the high weight for the musical scale. One might 
suppose that the persuasive, literary, and social service scales 
would distinguish a ‘non-scientific’ group to a great degree; but 
such a hypothesis is not supported by the responses of these 
subjects. 

A discriminant function value was computed for each of the 
one hundred eighty subjects. The mean discriminant function 
score for each field is presented in Table 3. While there is much 
overlapping among the individual cases of the two groups, it will 
be noted that the field means of the two groups do not overlap at 
all. In fact there is a considerable gap between the means for 
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anthropology, the lowest of Group A, and architecture, the 
highest of Group B. 

The mean and standard deviation values for Groups A and B 
are shown in Table 4. The difference between the mean of 
Group A and the mean of Group B yields a f-value of 12.13. The 
probability of obtaining a ¢-value as large or larger is less than .001 
if random fluctuation is the only cause of difference; or, to state 
the matter another way, it appears that the Kuder scores provide 
a sound basis for differentiating among Harvard students who 
propose to concentrate in different academic fields. 


TABLE 3.—DISCRIMINANT FuNcCTION MEAN FOR EACH 
FIELD OF CONSIDERATION 


Group A Group B 

Physics 5842 Architecture 3870 
Chemistry 5653 Philosophy 3512 
Biochemistry 5497 Social Relations 3497 
Mathematics 5373 Geology 3439 
Engineering Science and Economics 3074 
Applied Physics 5072 Slavic Languages 2871 
Biology 4801 History 2834 
Anthropology 4434 Fine Arts 2562 
Government 2514 

Romance Languages 2437 

History and Literature 2173 

English 2154 


TABLE 4.—MEAN AND STANDARD DEVIATION OF DISCRIMINANT 
FUNCTION FOR Two GROUPS 


Group A Group B 


Mean 5256 2825 
Standard Deviation 1052 1180 


For practical purposes a critical score had to be established to 
differentiate the two groups. The objective was to minimize the 
proportion of ‘misplaced’ cases in each group. By a process of 
trial-and-error the score 3988 was obtained. 13.6 per cent of the 
cases in Group A lie below this score; and 14 per cent of the cases 
in Group B, above it. There is only one field in which more than 
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one-third of the cases are misplaced. Three of the five architec- 
ture concentrators have scores higher than 4200, but the field 
mean is lowered to 3870 principally by a single score of 2370. 
There is much to be said for placing architecture in Group A. 
On the other hand, in twelve of the fields less than twenty per cent 
of the students fall into the ‘wrong’ major category. 

There are two principal limitations to be kept in mind when 
using the present results. First, the data are ‘loaded’ in the 
sense that Groups A and B were subjectively determined from 
inspection of the Kuder scores themselves, so that application 
of the obtained discriminant function weights to a new group of 
students would probably produce more overlapping than has been 
observed in the group studied. Second, the results were calcu- 
lated on the basis of the first selection of a field of concentration. 
A set of weights obtained by discriminating groups of fields in 
which students graduate from college would be likely to yield 
more valid predictions of satisfaction. Nevertheless, the results 
of this study suggest that the Kuder Preference Record is capable 
of providing useful information to the student who is undecided 
about his field of concentration and that further investigation 
of the instrument along lines similar to those adopted here would 
be fruitful. 

The student’s choice could be narrowed down to a smaller lst 
of fields if other sets of weights were calculated for discriminating 
subgroups of these cases. However, the use of such subgroups 
would be precarious in the absence of more extensive data. 

The main use of the above results is, of course, as a first 
approximation to an answer to the student seeking the field of 
concentration in which his interest is likely to be high. The 
higher the per cent value of the student’s discriminant function 
score in Table 5, the more confident is the tester in placing him in 
one of the two groups. 

There is another practical application of the results. Selover? 
(p. 463) points out that ‘‘ . . . cases difficult to classify according 
to major group on the basis of test performance do not indicate 
a weakness of the tests, nor of the method. Rather the value of 
the method for locating students who need guidance is illus- 
trated ...’’ The more extreme of the twenty-five misplaced 
cases should be investigated by interview. It is probable that 
the engineering science and applied physics concentrator who 
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TABLE 5.—CUMULATIVE FREQUENCY DISTRIBUTIONS FOR THE 
Two GROUPS 


Discriminant Casesin Group A Cases in Group B 
Function Lower Thanthe Score Higher Than the Score 
Score Number PerCent Number Per Cent 
0 0 0.0 136 100.0 
400 0 0.0 135 99.3 
800 0 0.0 128 94.1 
1200 0 0.0 126 92.6 
1600 0 0.0 112 82.4 
2000 0 0.0 101 74.3 
2400 1 2.3 91 66.2 
2800 1 2.3 71 52.2 
3200 2 4.5 53 39.0 
3600 3 6.8 30 22.1 
4000 7 15.9 18 13.2 
4400 10 22.7 11 8.1 
4800 11 25.0 9 6.6 
5200 17 38 .6 4 2.9 
5600 25 56.8 2 1.5 
6000 33 75.0 1 0.7 
6400 39 88.6 0 0.0 
6800 43 97.7 0 0.0 
7200 44 100.0 0 0.0 


obtained a discriminant function score of 2337 chose this field for 
reasons other than genuine interest. 

This study was conducted to enable predictions of the amount 
of interest college students will show in various fields of concen- 
tration on the basis of their Kuder Preference Record scores. A 
set of nine weights was computed, by means of which a student 
can be placed in one of two groups of fields. Keeping certain 
limitations in mind, the tester can by using the results obtained 
minimize academic dissatisfaction among students who seek his 
aid. 
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GROUP AND INDIVIDUAL VARIABILITY 
ON THE GOODENOUGH DRAW-A-MAN TEST* 


HAROLD GRIER McCURDY 


Meredith College 


Variability of performance in a standard situation is fully as 
interesting as consistency, however deplorable it may appear to 
the constructors and practical users of tests. It is, of course, a 
highly important aspect of the scientist’s job to invent measuring 
devices which will yield maximally dependable results, and to 
use these devices with circumspection. Even with crude and 
unreliable measuring instruments, however, the psychologist will 
discover numerous variations in his measures which cannot be 
attributed simply to the faultiness of the instruments. It is to be 
expected that such variations will become more conspicuous as 
the measurements become more precise. A number of studies 
paying attention to this important matter, in the case of the 
Goodenough test and others, are included in the list of references 
(1, 5, 6, 7, 8, 10). 

The scoring reliability of the Goodenough Draw-a-Man Test 
appears to be reasonably high. McCarthy, while properly 
emphasizing the dangers of misjudgment in individual cases, 
found that on the whole her trained scorers were self-consistent 
to the extent of an r of .94 when scoring the same drawings, even 
after some lapse of time. The present writer, checking his own 
scorings against the Goodenough scorings for the sixteen 
test drawings on pages 156-159 of her manual,* found that 
his estimates correlated with hers .99, being one to three points 
lower in five instances and one and two points higher in two 
instances. 

In spite of the stress which McCarthy places on scoring vari- 
ations in the use of the Goodenough test, it is clear from her study 
that variations in the drawing performance itself are noticeably 
more sizable. When three hundred eighty-six third- and fourth- 





* Miss Mollie Fearing assisted in collecting the drawings by first-grade 
children used in this study. 
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grade children were tested twice at a week’s interval and scoring 
unreliability was minimized by having the drawings scored twice 
without benefit of comparison, the coefficient of correlation 
between test and retest was .68. In nearly forty-two per cent 
of the cases the MA score changed a year or more. McCarthy 
points out, in reviewing prior work, that the correlation of .68 
between test and retest in her experiment is more like the result 
obtained by Hinrichs, who reports a correlation of .65 for 
seventeen cases tested at varying intervals from six months to 
two and a half years,‘ than that obtained by Goodenough, who 
reports a correlation of .94 for one hundred ninety-four first- 
graders tested twice at the interval of a day,* or than that 
obtained by Smith, whose correlations ranged from .84 to .96 
for large numbers of children tested and retested on the same 
day.® 

The total impression received from McCarthy’s work and her 
review of the literature is that when the interval between tests is 
a day or less there is a higher correlation than when the interval 
islonger. This impression is confirmed by the more recent work 
of McHugh.’ He reports that consecutive Goodenough trials in 
the same testing period yielded correlations of .91 and .83 for, 
respectively, eighty-three and ninety kindergarten children. In 
contrast, the correlation figure (for eighty-three children) was 
.46, when an interval of about two months occurred between 
trials. 

Comparison of these various reports leads to the conclusion 
that variability of performance on the Goodenough test is to an 
important degree a function of time. That this is by no means a 
peculiarity of the Goodenough test is evidenced by a number of 
other studies, of which those by Paulsen* and Woodrow” are 


examples. 


DATA OF THE PRESENT STUDY 


The data to be reported here bear upon the question of the 
contribution of time to variability of performance on the Goode- 
nough test. They also open up some other questions which 
deserve to be studied further, particularly the question of the 
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nature of individual variability, and the question of the rela- 
tion between the variability of the individual and that of the 
group. 

A. Data on a group of first-grade children.—The group con- 
sisted of white children in three first-grade rooms of a public 
school located in a somewhat favored section of a somewhat 
favored southern city. At the time of the second administration 
of the test, in May of 1947, their ages ranged from 79 to 91 
months, with a mean age of 83.9 months and an SD of 2.8. They 
comprised both boys and girls. 

The tests were administered by their teachers, who had been 
instructed to follow the directions given by Goodenough in urging 
the children to draw the best man they could. Drawings were 
made on uniform paper and with uniform pencils. The first 
test was administered on February 13 and the second on May 6, 
1947. The drawings were scored by the writer, who avoided 
making comparisons between the two sets while scoring; some 
time elapsed between scorings. As already indicated, the 
writer’s scoring appears to be in harmony with Goodenough’s, 
with a slight tendency toward greater strictness. 

In all, fifty-nine usable pairs of drawings were obtained. The 
mean IQ score for the first set of drawings was 115.2, with an SD 
of 20.8; for the second set, the mean was 117.9, with an SD of 
19.4. The differences between these correlated means and SD’s 
in terms of the standard error are, respectively, .17 and 1.51, and 
are thus insignificant.? The coefficient of correlation between 
the two sets of scores was .69. 

Inspection of the two sets of scores for differences between 
performances for the same individual revealed a range of from 
—26 to +51; that is to say, individuals scored from twenty-six 
less to fifty-one more points on the second trial than on the first. 
Disregarding the signs, the mean amount of change was twelve 
points, with an SD of 10.2. 

B. Data on one child out of the group.—The parents of one boy 
who was a member of the group described have placed at the 
writer’s disposal a large collection of drawings made by their 
child between about two and a half and seven years of age. 
Included in this collection of over a thousand drawings are a 
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good number representing the human figure, in whole or in part. 
It was thought that, though the drawings, with few exceptions, 
were not made at request but quite spontaneously, there might 
be some value in scoring some of them according to the Goode- 
nough scale for the sake of tracing the child’s development and 
thus having a series of scores for one individual to compare with 
the group’s performance. The drawings, it should be noted, 
were produced at home, mostly in pencil, without guidance from 
the parents, and without any deliberate effort to stimulate the 
child’s activity beyond supplying drawing materials and showing 
interest in his work and occasionally asking him if he would like 
to draw. 

Fifty-six drawings of men were picked out of this collection for 
the present study. They represent the majority of the complete 
human male figures in the collection, except that a number of 
drawings were excluded because they were obviously intended as 
comic distortions or were obviously not done with the usual 
amount of care. The process of selection, therefore, tended to 
trim off some of the poorer work, but certainly not so much that 
the range of performances was greatly narrowed. The fifty-six 
drawings were produced in the period between 31 and 83 months 
of age. 

The accompanying graph plots the Goodenough IQ scores 
against months of age. It will be observed that some of the 
points on the graph represent single drawings, others the average 
of two or more. The range of the scores is from 127 to 216; the 
mean, 171.5; the SD, 21.9. 

A natural question is whether the large amount of drawing 
practice had the effect of improving the scores as time went on. 
Apparently, such an effect was present only slightly if at all. 
Correlating the IQ scores with age yielded an r of .14, with a 
sigma of .13. 

The scores, however, appear to be related to time in another 
way. They rise and fall by periods rather than purely at random. 
If the scores are taken in chronological order and arranged in 
odd-even pairs, making twenty-eight pairs, the correlation is .65, 
with a sigma of .11; and not around zero. That is to say, there 
is some degree of similarity between temporally adjoining pairs, 
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even though the total range of scores is wide. It happens, in 
fact, that the correlation figure in this case is close to the corre- 
lation of .69 for test-retest with the group. The relatively slow 
fluctuation of performance levels is likewise shown by the graph. 
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Graph showing fluctuation of Goodenough IQ scores in the case of an individual 
child from 31 to 83 months of age. Fifty-six drawings are represented; the 
circles stand for the averages of two or more drawings, the points for single 
drawings. The notations are explained in the text. 


Other fragments of evidence indicate that these fluctuating 
performance levels were not based on superficial causes. Quite 
by accident the Binet IQ (1937 Stanford-Binet) was determined 
near one of the peaks (March 27-28, 1946) and near one of the 
troughs (August 13-14, 1946). The first Binet IQ was 150, and 
the drawing of a man which was requested to accompany this test 
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scored 216. (In the graph this score is averaged in with the 
spontaneous drawing occurring on April 8, which is in the same 
month of age and which scored 159, giving an average of 187.5; 
if the spontaneous drawing preceding it on March 3 had been 
used instead, the average would have been 211.) The second 
Binet IQ was 142. Unfortunately, no drawing was requested on 
this second occasion, but the two spontaneous drawings occurring 
in the month of August average 142.5. The two Binet IQ’s, 
then, are roughly paralleled by the Goodenough IQ’s in the same 
periods. The same sort of parallelism is shown on the graph 
between the spontaneous drawing performances at home and the 
requested performances at school. Thus, the first classroom test 
drawing in February scored 197; the three spontaneous drawings 
occurring in February average 184, the one made two days after 
the school test scoring 197. The second classroom test drawing 
scored 181; there are no drawings in the collection for this 
month of May, but the average of the two nearest drawings, 
,April 12 and June 20, is 176.5. The higher and lower classroom 
performances coincide with relatively higher and lower home 
performances. 


DISCUSSION 


From the data of the present study and of earlier ones, it is 
perfectly clear that very considerable variations in performance 
level occur on the Goodenough test after a lapse of several days 
or more. This conclusion is not in conflict with Goodenough’s 
statement about the reliability of the test, because she had 
reference to a brief time interval of a day. But, as McCarthy 
has indicated, it is unwise to claim as small variability for this 
test as for the Binet over any great span of time. The behavior 
measured by the Binet test apparently has greater stability than 
that measured by the Goodenough test, and this seems reasonable 
enough, considering the fact that physical céordination con- 
tributes to the Goodenough results; for, as Ellis has pointed out,! 
simple functions exhibit more variability in general than do com- 
plex ones, which is pertinent here if the motor skills involved in 
drawing may be characterized as simpler than the memory and 
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reasoning tasks set by the Binet test. To refer once more to the 
performances of the individual child in this study, it is notable 
that where there was a difference of eight points on the Binet 
there was a much larger difference between the drawing scores 
for the two periods. It is likely, of course, that the difference 
in the drawing scores would have been less if the comparison had 
been made between drawings produced in accord with the stand- 
ard Goodenough instructions; but, judging by the harmony 
between spontaneous and requested work described above, it 
seems probable that there would still have been a greater differ- 
ence between the Goodenough scores than between the Binet 
scores. 

A second conclusion based on the present evidence, which is 
slightly more novel, is that the variations in Goodenough scores 
correspond to a somewhat slow periodic fluctuation or quotidian 
variability (to use Woodrow’s term) which is not based on super- 
ficial environmental or attitudinal changes. The graph, pre- 
viously described, brings out the pertinent facts. Not only do, 
the Goodenough scores rise and fall by stages, but also there is a 
correspondence between these stages and the Binet scores and 
the Goodenough scores obtained in the classroom situation. 

If we look at the data for both group and individual side by 
side, we see that the Goodenough scores are scattered similarly 
about the mean. The SD of 21.9 for the individual child’s 
fifty-six drawings is not significantly different from the SD’s of 
20.8 and 19.4 for the group of fifty-nine children on their first 
and second tests. It is an interesting question whether this 
particular similarity between the group and the individual would 
be found to hold true generally. Possibly the child in this case, 
since his performance lies at the group’s upper extreme, exhibits 
greater variability than would a child taken from the center of 
the distribution. But that we are touching on something funda- 
mental here is suggested by the calculation of Hull® that the 
range of differences between different traits possessed by the 
individual is over eighty per cent as great as that within a normal 


group. 
Approaching the question of variability from another angle, 


we note that the correlation coefficient of .65 for the twenty-eight 
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pairs of temporally adjacent drawings in the case of the indi- 
vidual child is close to the value of .69 obtained from correlating 
the group’s first test with its second. Again, making the same 
comparison in a different way, the mean amount of variation 
between the pairs of drawings for the individual child was 14.3 
IQ points, while for the fifty-nine children on test and retest it 
was 12 1Q points. The standard error of the difference between 
these two uncorrelated means, with SD’s of 9.9 and 10.2, re- 
spectively, is only 1.1, and therefore insignificant.2, Once more, 
then, the variability of the individual child and of the group 
proves in this instance to be the same. 


SUMMARY 


The present study, making use of drawings of men produced by 
first-grade children on two occasions about three months apart 
and a series of drawings produced by one child over a period of 
more than four years, agrees with prior work in showing con- 
siderable variation in Goodenough scores after some lapse 
of time. By reference to the series of drawings by the individual 
child, it presents evidence for the view that this variation rests 
upon a periodic fluctuation of ability which also affects Binet 
test performance as well as Goodenough, though to a slighter 
degree. Finally, it demonstrates that in this instance the vari- 
ability of the group and the variability of the individual are of 
the same order of magnitude. 
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RELATIONS BETWEEN ABILITY 
AND SOCIAL STATUS 
IN A MIDWESTERN COMMUNITY. 
IV: SIZE OF VOCABULARY 


MARY JEAN SCHULMAN AND ROBERT J. HAVIGHURST 


The Committee on Human Development, 


The University of Chicago 


This is one of a series of papers reporting relationships between 
social status and a variety of abilities in boys and girls. In this 
particular study, the Seashore-Eckerson English Recognition 
Vocabulary Test was given to all high-school freshmen and 
sophomores and to all other children born in 1932 who lived in a 
typical midwestern community. Correlation coefficients are 
reported for the relations between vocabulary scores and social 
status, the Thurstone V factor, and scores on the Iowa Silent 
Reading Test. 


THE COMMUNITY 


Midwest is a stable community of six thousand serving as a 
county seat to a surrounding rural area. These have a combined 
population of ten thousand. The population is ninety per cent 
native-born white, with small ethnic groups of Polish and Nor- 
wegian descent. Midwest is typical by a number of census 
criteria of many small midwestern cities. It has small indus- 
tries, yet it serves a rural population. It is not supported by 
any one large institution or industry, nor is it in the shadow of a 
larger city. 

Inhabitants of the community were classified according to 
Warner’s method of social class analysis into five groups or social 
classes, which are designated by letters A to E, with the A group 
the highest in the social scale. Compared with similar groups 
in other cities, these correspond to the upper, upper middle, 
lower middle, upper lower, and lower lower classes, respectively.' 





1 For a more complete account of the community and its social structure, 
see: Havighurst, R. J., and Janke, Leota L. “Relations Between Ability and 
Social Status in a Midwestern Community, I.: Ten-year-old Boys and 
Girls.” J. Ed. Psych. 35, 357-68 (1944). 
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In addition to classifying these children into five social classes, 
the research staff computed a socio-economic index for the family 
of each child. Four status characteristics were used in comput- 
ing this index: occupation, source of income, house type, and 
dwelling area. The resulting index, called the Index of Status 
Characteristics (ISC) is a number which can be used in comput- 
ing product-moment correlation coefficients. Warner and his 
colleagues found the ISC to correlate very highly with ratings 
of social status based on Warner’s criteria of social participation 
and social reputation.? 

The children tested in the present study included all of those 
born in 1932, the same group on which earlier reports have been 
made.* Among this group, only three fell in group B, or the 
upper middle class, and there were no children of group A, or 
upper class families. Hence, the statistical comparisons are 
limited to children of the more populous groups. 


THE TESTING OF VOCABULARY 


The Seashore-Eckerson Test of Vocabulary was selected 
because it represents a new method in vocabulary testing— 
sampling from an unabridged rather than from an abridged dic- 
tionary.‘ Seashore and Eckerson found that the earlier methods 
of measuring vocabulary, based on samplings from abridged dic- 
tionaries,® grossly underestimated the size of a person’s recogni- 
tion vocabulary. 


2 Warner, W. L., Meeker, Marchia L., and Eells, Kenneth S., The Measure- 
ment of Social Status. Chicago: Science Research Associates, 1948. 

8 Reference 1. Also Havighurst, R. J., and Breese, Fay H., “‘ Relations 
between Ability and Social Status in a Midwestern Community. IIL: 
Primary Mental Abilities.” J. Ed. Psych., 38, 241-247, (1947). 

4Seashore, Robert H., and Eckerson, Lois D., “‘The Measurement of 
Individual Differences in General English Vocabularies.”” J. Ed. Psych., 31, 
14-38, (1940). 
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Size of Recognition Vocabularies Among College Students.” J. Ed: Psych., 
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The Seashore-Eckerson test consists of three parts. The first 
part consists of one hundred seventy-three words, each to be 
identified with one word in a four-choice group. The words are 
arranged roughly in order of difficulty. They are a sample of 
‘general terms’ which appear in heavy type at the margin in 
the dictionary. The second part consists of a supplementary 
list of proper names, geographical names, and rare words for 
which the testee is asked to write definitions. The ordinary high- 
school pupil knows only two or three of these words, and conse- 
quently this part may be omitted from all except the more 
exhaustive studies. The third part of the test is a list of forty- 
six terms derived from the ‘basic list’ which is sampled in the 
first two parts of the test. This section consists of compound 
terms and terms derived from the basic list by adding -ed, -ing, 
-ism, etc. 

The score on Parts One and Two gives an estimate of the 
testee’s ‘basic vocabulary.’ There are a total of approximately 
167,000 ‘basic words’ in Funk and Wagnalls’ New Standard Dic- 
tionary of the English Language (Unabridged), which was used 
as the source of the test. 

Whereas earlier investigators, using samples from abridged 
dictionaries, estimated the vocabularies of high-school pupils 
to range from ten to twenty thousand words, Seashore and 
Eckerson‘ and M. K. Smith® found the ‘basic’ vocabularies (using 
Part One of this test) of high-school students to range from 
twenty to sixty thousand words. 

The Seashore-Eckerson test (Part One only) was administered 
in high-school English classes by teachers to all ninth- and tenth- 
grade pupils in Midwest, in the winter of 1946-47. In addition, 
the test was administered to all other children who were born 
in the year 1932. Most of this age group were in the ninth 
grade, but some were retarded in school as far back as the fifth 
grade. For these retarded children, the test was administered 
by a member of the research staff in small groups or individually. 
All these tests were given as group tests with sufficient time 
(40 minutes) for all children to finish them. The three children 





* Smith, M. K., “‘ Measurement of the Size of General English Vocabulary 
Through the Elementary Grades and High School,’ Genetic Psychology 
Monographs, 24, 311-45 (1941). 
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who scored lowest were given the test again by the field worker 
orally. That is, she read each test item to the child and asked 
him to name the right answer, thus avoiding the necessity of his 
reading. One fifth-grade boy, who had scored zero when he 
took the test as a group test, raised his score considerably in 
this way. Two others who scored quite low on the group test 
raised their scores only slightly. 


RESULTS 


Scores on the vocabulary test for the group born in 1932 were 
correlated with scores on the Thurstone V factor,’ which had 
been administered in October, 1945; and with scores in the Iowa 
Silent Reading Test (Elementary Form AM. (Revised) new 
edition,*) which had been administered to this group in the spring 
of 1946. 

Table I gives the mean scores, standard deviations, and ranges 
for the various social class groups in the age-group born in 1932. 
The scores are given in numbers of words known, and represent 
the actual number known on Part One of the test multiplied 
by 505, since this sample was 1/505th of all such words in the 
unabridged dictionary. 

The critical ratios for the status groups are: C—D, 2.0; C-E, 
3.8; D-E, 2.8. 

The product-moment correlation coefficients describing the 
relation between vocabulary score and Thurstone V factor, Iowa 
Silent Reading Test, and socio-economic status (ISC) are, respec- 
tively, .79+ .04, .75 + 0.5, and .46 + .08. 

Table II gives the results for ninth-graders and tenth-graders. 
divided by sex, and for ninth-graders born in 1932 only, divided 
by sex and by urban and rural residence. 


DISCUSSION OF RESULTS 


As was found in the other tests given to the group bern in 
1932, there was a consistently positive relation between test 
scores and social status. The higher the social status, the higher 





? Thurstone, L. L., and Thurstone, T. G., Manual: The Chicago Tests of 
Primary Mental Abilities, American Council on Education, Washington D.C 
1941. 

* Greene, Jorgenson and Kelley, ‘“‘Iowa Silent Reading Test,” Manual of 
Directions, World Book Company, New York, 1943. 
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the average vocabulary size. However, there was a great deal 
of overlapping of the various status groups. For example, the 
person with the highest score was in the group next to the bot- 
tom. The coefficient of correlation of socio-economic status with 
vocabulary size, .46, is slightly higher than those usually reported 
for the relationships between socio-economic status and measure- 
ments of intelligence or mental abilities, which usually range 
from .3 to .4.° 

The averages for ninth- and tenth-grade groups are reasonably 
close to those given by Smith for pupils from three high schools. 

There is no reliable difference (C.R. = .7) between boys and 
girls in the group born in 1932, which is the complete group. 
However, the difference between boys and girls in the ninth grade 
is reliable at the five per cent level of significance. This is 
probably due to an accident of sampling, for the retarded boys 
in this class (those over fourteen) outnumbered the retarded girls 
thirteen to eight, even though there were more girls than boys 
in the class. 

There is no reliable difference between urban and rural pupils, 
among those born in 1932 and in the ninth grade. This com- 
parison leaves out of consideration the children of the same age, 
urban and rural, who were retarded in school. 


SUMMARY AND CONCLUSIONS 


The Seashore-Eckerson English Recognition Vocabulary test 
was given in the winter of 1946-47 to all children born in 1932, 
and residing in a typical small midwestern city, and to all ninth- 
and tenth-grade pupils. The group born in 1932 were classified 
into social classes and also scored on a socio-economic scale. A 
year or less previously, the 1932 group had been given the 
Thurstone V factor test and the Iowa Silent Reading Test. 

1) Children of higher social status made higher average scores 
on the vocabulary test than children of lower status, on the 
average. 





* Loevinger, Jane, ‘Intelligence as Related to Socio-economic Factors,”’ 
pp. 159-210 in Intelligence: Its Nature and Nurture, 39th Yearbook of the 
National Society for the Study of Education, Part I. Chicago, Department 
of Education, University of Chicago, 1940. 
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2) The coefficient of correlation of vocabulary size with socio- 
economic status was .46 + .08. 

3) Scores on the vocabulary test correlated .79 + .04 with 
scores on the Thurstone V Factor and .75 + .05 with scores on 
the Iowa Silent Reading test. 

4) There was no reliable difference between boys and girls, 
or between urban and rural pupils (but children retarded in 
school were omitted in the urban-rural comparison). 

5) Mean scores in the Midwest group were quite close to the 
means reported by M. K. Smith from three other high schools. 


TABLE I.—Basic VocABULARY IN RELATION TO SocrAL CLass 


Mean Standard Range 
Social Class Number Vocabulary Deviation (thousands) 
B 3 45,600 9,100 35-57 
C 18 41,700 6,000 31-54 
D 60 37,900 9,000 18-60 
E 9 28,800 8,600 16-41 
TABLE IIT.—VocaBuLarY SIZE FOR NINTH- AND TENTH-GRADERS* 
Mean Range 
Group Number Vocabulary S D_ (thousands) 
9th grade 97 38,900 (37,900) 8,300 20-60 (16-60) 
10th grade 82 41,400 (43,100) 8,600 16-60 (22-57) 
9th gr. boys 41 36,800 9,100 20-57 
9th gr. girls 56 40,400 7,300 21-60 
10th gr. boys 42 41,700 9,800 16-60 
10th gr. girls 40 41,000 8,000 18-54 
boys, b. 1932 38 37,300 9,000 17-57 
girls, b. 1932 57 38,500 8,800 16-60 
urban, 9th gr. 57 40,800 6,700 26-57 
b. 1932 
rural, 9th gr. 15 41,300 9,200 29-60 
b. 1932 


* Numbers in parentheses are norms obtained by Smith from approxi- 
mately one hundred fifteen subjects of each age in three high schools. 
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NOTE ON AN UNNECESSARY SOURCE 
OF CONFUSION IN STATISTICAL TERMINOLOGY 


FLORENCE L. GOODENOUGH 


University of Minnesota 


Because it is rarely possible to examine all the individual 
cases making up a given universe, most scientific studies are 
based upon samples. These samples are presumed to be repre- 
sentative of the total, either because they have been chosen 
according to one or another of the accepted methods for securing 
a random sample, or, if this is not feasible, because the selection 
has been made according to a well-conceived plan of stratified 
sampling. However, inasmuch as a sample usually includes only 
a small proportion of the total number of cases in the universe 
from which it is drawn, the statistics of the sample are unlikely 
to show exact conformity to the corresponding statistics of the 
universe. Empirical evidence for this statement is givea by 
the fact that the statistics of successive samples drawn from the 
same universe by the same sampling procedures are unlikely to 
be identical. 

The amount of confidence to be placed in the statistics of a 
single sample is obviously a function of the variability of these 
statistics from sample to sample. And, as is well known, an 
estimate of the most probable extent of this variability for any 
given statistic may be made on the basis of the variability of 
that statistic within the sample and the number of cases included 
in the sample. 

We are thus dealing with two sets of figures, one of which has 
to do with the single sample which is actually measured, the 
other with a theoretical series of samples from the same universe. 
The sample is always of finite size and the accuracy of the sta- 
tistics derived from it is affected only by errors of measurement. 
The series is of infinite size and the accuracy of the statistics esti- 
mated for it on the basis of the single sample is affected not only 
by errors of measuring the cases within the sample but also by 
errors of sampling. 

It is accordingly unfortunate that in common practice the 
same terms and the same statistical symbols are used to denote 
both the ascertained variability within the single sample and the 
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estimated variability of a given statistic from sample to sample. 
This loose terminology is a frequent source of confusion to ele- 
mentary students, and even persons with a fair amount of sta- 
tistical training are sometimes misled into using the wrong figure 
when making use of unfamiliar formulas. Such confusion is the 
more unnecessary inasmuch as two terms are already in common 
use which need only be applied specifically instead of inter- 
changeably to clear up the difficulty. These terms are ‘standard 
deviation’ with SD as its symbol and ‘standard error’ with the 
lower-case Greek sigma (c) as its symbol. If variability within 
a single measured sample were always to be designated by the 
symbol SD and the estimated variability of a given statistic 
within an infinite series of samples of the same universe by the 
symbol o with subscript to indicate the particular statistic under 
consideration (as oy, ¢s,p. etc.) the confusion would be eliminated. 
A similar state of affairs exists at present with respect to the 
use of the term ‘probable error.’ As has frequently been pointed 
out by statisticians, this expression has no useful meaning except 
in reference to distributions which are strictly normal in form, 
in which case it indicates one-half the range of the middle fifty 
per cent of cases. Unless the single sample is very large, the 
requirement of complete normality is rarely fulfilled. In the 
case of the infinite series, however, normality may fairly be 
assumed, since sampling errors tend to become normally dis- 
tributed as the number of samples is increased. Accordingly, 
if the interest lies in determining the amount of confidence to 
be placed in a single statistic, i.e., how great the variability is 
likely to be from sample to sample, the probable error, found by 
taking .6745 o, will provide an indication of the expected range 
of the middle fifty per cent of the differences. But if one is 
concerned chiefly with the characteristics of the single measured 
sample, the semi-interquartile range (Q) found by taking one- 
half the difference between the 25th and the 75th percentiles is 
the value to be used. In this case the probable error calculated 
from a is unlikely to yield a meaningful figure. : 
The points raised here are by no means new and may seem both 
unnecessary and pedantic. However, long experience in training 
students in the use of statistical methods has demonstrated that 
they are by no means generally understood. Nor is the con- 
fusion limited to students. One need not search long to find 
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instances in the literature where methods appropriate to the 
series have been applied to the sample and vice versa. Moreover, 
the distinction between sample and series has been found very 
useful in clarifying the meaning of such terms as ‘the standard 


error of the mean’ which many beginning students find both 
vague and puzzling. 
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